JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
16 stars 1 forks source link

Proposal for improving cross-references #39

Open Marcusjmdict opened 2 years ago

Marcusjmdict commented 2 years ago

I started writing a proposal on fleshing out our xref system several years ago which I last edited in 2018. I think I intended to add more examples and other category suggestions but I figure there's enough in here to start a discussion:

I believe we could improve the usefulness of many JMdict entries quite a bit by adding several new cross reference/xref categories ("types").

While we're pretty strict about what entries get the [ant] (antonym) xref, it's kind of an "anything" goes situation with [see].

[see] currently has a couple of different usages: 1) we often use it to mean "More commonly as:" 2) we commonly use it to point to the unabbreviated form of an abbreviated entry. (creating a bit of confusion for entries where the abbreviated form is by far the most common!) 3) we sometimes use it for synonyms that aren't necessarily more common, and between very similar entries like xがy and xのy 4) we sometimes use it to show etymology or the constituent parts of a compound word or phrase, e.g. [see=スレ] in the 糞スレ entry. 5) in some cases, we use it to refer to examples of a phrase, e.g. in the 掛ける entry, the "to put on glasses" sense has an xref to 電話をかける (which is kind of the opposite of use 4, which can be confusing, and I think we should only be doing one of these things) 6) we often use it to "explain" a Japanese term that is given as-is in Romanized Japanese in a gloss e.g. the redirect to 子の日の遊び in the 子の日の松 entry (where the gloss is "pine shoot pulled out during ne-no-hi-no-asobi") 7) we often use it for "contrasting" use e.g. things that have opposite or contrasting meanings but perhaps not in the strict sense that they qualify for [ant]. e.g. [see=出職] in the 居職 entry 8) we often use it in conjugated entries to point to the non-conjugated form of the word, e.g. [see=しまう] in the しまった entry 9) we sometimes use it to xref things that are in a ranked relationship, e.g. the " highest (of a three-tier ranking system)" in the 松 entry has xrefs to both 梅 and 竹 10) we sometimes use it to refer to "the official name" in entries, for example 細田派 → 清和政策研究会 (proposed), even when the less official name is the more common one. 11) we sometimes use it to show the standard Japanese equivalent of a dialectal word, for example やろう→だろう

and there's probably other use cases I haven't covered here, as we really don't have any firm rules on when to use it.

It might not be strictly necessary to split the general [see=] xref into 10+ different things, but also... might we not as well? I feel that having more "granular" tags could be a very major improvement to the dictionary. It would give dictionary applications plenty more options on how to format the xrefs and make entries easier to understand. I think this has the potential to help us close the gap between our entries and 中辞典/GG5's in terms of helpfulness/ease-of-understanding. (For the record, I don't think there's much of a gap when it comes to accuracy/completeness!)

Here's some suggestions for new cross reference types:

Marcusjmdict commented 2 years ago

https://jisho.org/forum/615b67d6d5dda76387000000-is-there-an-easy-way-to-find-the-counterpart-in-in-transitive-verb-pairs

Wouldn't it make sense for jisho.org to point to the respective counterpart in a pair? E.g. in entry 割る something like "See also: 割れる".

This could be handled by [see=] or maybe by one or two new xref types.

Kimtaro commented 2 years ago

I really like this proposal, @Marcusjmdict. I'll definitely add support for showing these cross references in Jisho if they are implemented in JMdict.

JMdictProject commented 2 years ago

I'll try and comment later in Marcus's discussion piece. There's certainly scope to expand on the initial attributes I suggested for the new element discussed in http://www.edrdg.org/wiki/index.php/JMdict:_Next_Generation#Cross-References The idea of introducing some form of hyponym/hypernym linking (Marcus' ranka/rankb) is a good one.

As for the linking of transitive/intransitive verb pair entries - that would be a very good move, IMO. It could do with an attribute of its own.

Marcusjmdict commented 2 years ago

Could we consider adding some of these x-refs already now (as opposed to as a new element in jmdict NG or after we've had enough time to discuss exactly which of these we should implement)? It seems it shouldn't be very complicated to allow for new "TYP" in our current system?

I try and add comments like "more commonly as xref" etc. in the comment field so that they can be dug up and converted later through thr jmdictdb advanced search, but having to re-visit them rather than adding this type of xref from the start ends up wasting a lot of precious time.

Specifically I'd love to see "more commonly as ..." and "contrast/compare with ..." and "abbr of ..." implemented as soon as possible.

parfait8566 commented 6 months ago

I think some system of inter-linking similar-meaning homophones (異字同訓) as described in #107 would be helpful. Also, not directly related, by have more types of [note] (grammar note, usage note, etc.)

razasyedh commented 5 months ago

I wholeheartedly support Marcus' proposal. Incidentally, I had a draft sitting around in my email from 2020 saying essentially the same thing.

More versatile cross-linking Currently, in JMDict we have the ability to create general references between entries ([see=]), and, less used, noting antonyms. However, I find that merely pointing users to other entries does not give enough information about the relationship being indicated. Sometimes it's to a compound demonstrating a specific sense of a term. Sometimes the target is another Japanese term we use in the gloss. Sometimes we do it because the target is a more common way of saying the thing. Other times, it's to explain part of a slang term. Finally, it could simply be a closely-related term. One project that you're likely familiar with is [WordNet](https://en.wikipedia.org/wiki/WordNet), which encodes the semantic relationship between words. While I don't think we need such a systematic approach, the results are quite useful. Similarly, we could be more specific when we build up our own web of interconnections. Actual Japanese dictionaries list out compounds separately, note common usages, and cross reference (usually with an arrow) other entries. So in addition to "see", we could have "derivedFrom", "moreCommonly", "alternateForm", etc.