JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
17 stars 1 forks source link

Company names in JMdict #24

Open jimmubreen opened 3 years ago

jimmubreen commented 3 years ago

At present the policy on proper names in JMdict does not include company names (https://www.edrdg.org/wiki/index.php/Editorial_policy#Proper_Names), however a number of company names are currently included: グーグル, マイクロソフト, アップル, 日産, エヌイーシー, アドービ, サムチョン, ジーメンス. ネットフリックス has just been added.

It is appropriate that some such names are included, so I propose that the policy be extended to include:

robinjmdict commented 3 years ago

I've been giving this some thought and I'm not sure I like the idea of including company names. We run into the problem of having to define "major company". There must be many hundreds of companies that fit that description, but how many of them are really worth adding?

I'd like to propose only including company names that also refer to ubiquitous services/platforms/products (e.g. Google, Twitter, Netflix, LINE). You're much more likely to encounter these terms in day-to-day conversation than most major company names (like Nissan or Microsoft). Of course, we'd have to decide what sort of companies fall into this category (would it include stores like Amazon or 7-Eleven, for example?), but ultimately it's a small subset of "major companies".

jimmubreen commented 3 years ago

OK, I think that should work. Probably most of the company names now in JMdict would fit with that, although エヌイーシー may need some thought. サムチョン should stay as it's slang. 日産 probably not. I'll think about a line in the policy.

robinjmdict commented 3 years ago

If this were to become policy, I think マイクロソフト, アップル, 日産, エヌイーシー, アドービ and ジーメンス would need to be dropped as they're just company names. They don't refer to specific services or products.

I agree about keeping slang. I also think colloquial terms for major companies (e.g. マック, ケンタ) should remain in jmdict.

jimmubreen commented 3 years ago

OK, I think that should work. Probably most of the company names now in JMdict would fit with that, although エヌイーシー may need some thought. サムチョン should stay as it's slang. 日産 probably not. I have put the following line in the policy;

In cases like マイクロソフト and 日産, they could be removed, or they could perhaps stay as grandfathered entries.

jimmubreen commented 3 years ago

This has largely been dealt with. I think it can be closed.

Marcusjmdict commented 3 years ago

Sorry to re-open this after it's been closed. I don't think the task of judging which companies are major ones is any different from judging whether other words belong in the dictionary. To make things easier, though, we could easily decide on a specific ngrams count (for older companies) and maybe a twitter count or similar for newer ones.

JMdictProject commented 3 years ago

I hadn't noticed that this wasn't actually reopened. I'm not sure we can set meaningful thresholds. Below are some n-gram counts for common company names. I note that none of them appears in GG5 or the other JEs.

マイクロソフト | 2563816 日産 | 1816429 トヨタ | 4060563 トヨタ自動車 | 411898 ニッサン | 104818 日産自動車 | 238539 ホンダ | 2533741 東芝 | 1558944 松下電器 | 202459 三菱 | 2330955 三菱商事 | 161538

JMdictProject commented 3 years ago

There has been further discussion on this as part of a proposed entry for 阪急 and 阪急電鉄. See: https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=2850086 I have deleted that proposed entry but it can be revived if we decide to expand company names inclusions into into that category.

Marcusjmdict commented 2 years ago

I hadn't noticed that this wasn't actually reopened. I'm not sure we can set meaningful thresholds. Below are some n-gram counts for common company names. I note that none of them appears in GG5 or the other JEs.

マイクロソフト | 2563816 日産 | 1816429 トヨタ | 4060563 トヨタ自動車 | 411898 ニッサン | 104818 日産自動車 | 238539 ホンダ | 2533741 東芝 | 1558944 松下電器 | 202459 三菱 | 2330955 三菱商事 | 161538

I don't think it needs to be harder than just deciding we'll in generally include most companies that have a 2008 ngram count over 100,000, or say an average of at least 100 daily mentions on Twitter. (not sure how closely equivalent those two measures are, but 三菱商事 seems to have around that much.)

There has been further discussion on this as part of a proposed entry for 阪急 and 阪急電鉄. See: https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=2850086 I have deleted that proposed entry but it can be revived if we decide to expand company names inclusions into into that category.

I think public transport companies would be especially good to have in jmdict, for tourists and language beginners trying to find their way in Japan. Whether or not we go through including more company names into jmdict per the main proposal, I think we should definitely include both the shorter versions ( 阪急) as well as the full, formal names (阪急電鉄) of these companies. This wouldn't be more than 30 or so entries in total to deal with for the major train companies, I figure. (or maybe up to as many as 100-200 if you wanted to add every local train and bus operator)

Marcusjmdict commented 2 years ago

I want to suggest we at least include more Japanese print newspapers (the national ones but also major local ones) + all news agencies that publish news in Japanese, and their abbreviations in jmdict too, e.g. 読売新聞 (in koj daij etc.) 中国新聞 (in koj) 共同通信 (in koj daij etc.) ロイター (in gg5, daij etc.) AFP (in koj daij etc.)

(I feel support for adding all kinds of companies with a certain ngram/tweet count isn't likely to materialize but I think it'd be great if we could do at least this + the aforementioned public transport companies, airlines, etc.)

robinjmdict commented 2 years ago

I'm still opposed to significantly expanding the number of companies in JMdict.

As I said above, I don't have an issue with adding company names that refer to widely used services or products. I gave Google, Twitter, LINE, etc. as examples but it could also include newspapers, supermarkets, etc. I don't see a need to put the names of corporate entities in JMdict when we have the names dictionary.

Marcusjmdict commented 2 years ago

So you're OK with including public transportation companies too?