JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
16 stars 1 forks source link

Plant / animal / fungi / etc. names as expressions? #111

Open stephenmk opened 7 months ago

stephenmk commented 7 months ago

Bit of a nitpicky issue, but this affects hundreds of entries.

Should the names of plants and animals that are derived from phrases be tagged as expressions? 蕗の薹, アラビアゴムの木, etc.

I see we took pains in the entries for 牛の舌 and 虎の尾 to distinguish the expression senses from the name senses. The latter are not tagged as expressions.

ウシノシタ [Link](https://www.edrdg.org/jmwsgi/entr.py?svc=jmdict&sid=&q=2175750.1) ![usinosita](https://github.com/JMdictProject/JMdictIssues/assets/8003332/82db0fae-7933-4cb8-a803-cef5f3108ae9)
トラノオ [Link](https://www.edrdg.org/jmwsgi/entr.py?svc=jmdict&sid=&q=2209130.1) ![toranoo](https://github.com/JMdictProject/JMdictIssues/assets/8003332/e5968321-6738-47e5-8f08-7fa03c768e3b)

The all-katakana forms are usually the most common forms in these entries, and those are the forms usually displayed in their respective wikipedia articles.

I'd be in favor of leaving them as just nouns. This is an extreme example, but we wouldn't tag キノコ (mushroom) as an expression even though it's derived from 木の子. I'd say the same logic applies to something like フキノトウ・蕗の薹.

JMdictProject commented 7 months ago

I'm not greatly fussed either way. A glance through the plant/etc. name entries seems to show that about half have "exp,n" tags. I wouldn't bother going into them and changing them.

A bigger question is the role of the "exp" tag. About 3,800 entries have senses with just an "exp" tag, and this seems appropriate as they are mostly sayings, proverbs, etc. About 8,700 entries have senses with "exp" combined with a POS such as "n", "adj-i", etc., usually because the form contains の, が, を, etc. To be frank, the reason for the "exp" tag for those entries is not clear to me. I'm not sure it serves much purpose.

robinjmdict commented 7 months ago

I wouldn't mind dropping [exp] from all AのB entries. Japanese dictionaries treat phrases like 蜘蛛の巣 and お手の物 the same as any other noun.

About 8,700 entries have senses with "exp" combined with a POS such as "n", "adj-i", etc., usually because the form contains の, が, を, etc. To be frank, the reason for the "exp" tag for those entries is not clear to me. I'm not sure it serves much purpose.

I disagree. 高い is a 形容詞 and 背が高い is an expression that ends in a 形容詞. One of the proposed changes for JMdict:NG is to move inflectional information on "exp" entries to an entry-wide element, which means [exp,adj-i], [exp,v1], etc. would simply become [exp].

I think [exp,n] is appropriate for longer phrases.

yamagoya commented 7 months ago

Just as a heads up, the NG stuff is mostly complete but the entry-wide inflection item was not implemented (everything else is) because it is not well defined yet (e.g., what values can it have?) I planned to look at it again after the first iteration of XML-NG was deployed and working. But if it's deemed important and fleshed out to an implementable form, I could probably look at including it now.

JMdictProject commented 7 months ago

I wouldn't mind dropping [exp] from all AのB entries.

That can possibly be done via the bulk-update utility. I see there are about 1,450 entries which are tagged (exp,n) and have AのB kanji form. I'll put this on my to-do list. Maybe next week.

JMdictProject commented 7 months ago

the NG stuff is mostly complete but the entry-wide inflection item was not implemented (everything else is) because it is not well defined yet (e.g., what values can it have?)

The background this matter is at https://www.edrdg.org/wiki/index.php/JMdict:_Next_Generation#Entry-wide_Inflection_Pattern_Elements and https://www.edrdg.org/wiki/index.php/JMdict:_Next_Generation#Part-of-Speech_Separation

Yes, this needs fleshing out in a number of aspects. I won't attempt to explore it here - probably a new issue should be opened for it. I think it's probably worth getting the structure sorted out now, as it would be messy to retrofit later.

robinjmdict commented 7 months ago

I didn't know that development of NG was so far along. I have quite a few proposals I'd like to see implemented or discussed. I should have done it ages ago but I kept putting it off. I'll create a new issue.

Marcusjmdict commented 7 months ago

Me as well!

On Sat, Nov 25, 2023, 09:30 Robin @.***> wrote:

I didn't know that development of NG was so far along. I have quite a few proposals I'd like to see implemented or discussed. I should have done it ages ago but I kept putting it off. I'll create a new issue.

— Reply to this email directly, view it on GitHub https://github.com/JMdictProject/JMdictIssues/issues/111#issuecomment-1826167081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUCQII7ORFFZDC6B25FVSY3YGE33BAVCNFSM6AAAAAA7WV4332VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWGE3DOMBYGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yamagoya commented 5 months ago

If this (discussion of NG changes) can be done earlier rather than later it would be appreciated. I can't promised any new/changed features will be implemented in the first released iteration of NG but it will help in planning for subsequent ones.