JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
16 stars 1 forks source link

Suggest folding [electr] into [elec] and eliminating [electr] #123

Closed briankrznarich closed 2 months ago

briankrznarich commented 3 months ago

Stumbled into this with what I expected to be a small edit to 抵抗器 "electrical resistor". Was just going to chop off "electrical", which is rarely said or written aloud, and tack on [elec]... or [electr]?

I looked for other terms to see what the practice was, and I see the confusion is not limited to me(based on comments on other terms, and flip-flops between the two). I can see what people had in mind when this distinction was made, but I think the end-result suggests we might be better off putting them together.

What sense does this make, for example: プリント基板(Printed circuit board) is [electr] マスク mask (e.g. for circuit (board) etching) is [elec]:

Our terms for "capacitor" (we have 7) and "resistor" (we have 9) are currently unmarked. Are these [elec] or [electr]? PCBs (if PCB isn't [electr], nothing is) are covered with resistors and capacitors. So [electr]? But a "capacitor" is a pretty fundamental concept in electrical engineering. Usually when a term covers two fields, we just put it in both. But we seem averse to tagging as [elec][electr].

I can imagine why, but it seems like the clean solution is just to eliminate the distinction. Neither of these are massive categories, there are only 88 terms combined (certainly many more terms could be added though).

Just to add to the above, some things which are [elec] but could be [electr]:

Some things which are [electr] but could/should (also) be [elec]:

I would further argue that every single electrical-logic term in [electr] properly belongs to "electrical engineering", which [elec] claims to include. This includes AND/OR/NOT/NAND gates & circuits, "negative logic", etc. This is 1/4th of the [electr] terms. Additionally, things like "wafer" and "silicon wafer" seem like engineering materials to me.

I honestly don't see a single term that is [electr] now that would suffer from being moved to [elec], and I don't think they would offend [elec] either in being moved.

I've edited a few terms that touch on this. (抵抗器 - add [elec], コンデンサ add [elec], バラン swap to [elec], 回生 other changes, スチコン other changes).

JMdictProject commented 3 months ago

I can't find the discussions we had when those field tags were established. I agree that it's often difficult to decide whether a term is in the "light" or "heavy" categories (or both.) I see GG5 rolls them all into just one 【電】set. We have about 60 in [elec] and about 30 in [electr].

I wouldn't mind merging them into a single [elec] (electricity, electronics) field.

robinjmdict commented 3 months ago

GG5 has 301 entries tagged as【電子工学】(electronics). There's definitely some overlap but I don't think the tags should be merged. Terms like プリント基板 and ウェハー firmly belong to the electronics domain. For what it's worth, English dictionaries like Oxford, Cambridge and Collins have an electronics field tag.

briankrznarich commented 3 months ago

Another alternative is just to be liberal about double-tagging [elec][electr] where appropriate.

To lead with, 電子工学 = "electronics engineering" https://en.wikipedia.org/wiki/Electronic_engineering "Electronic engineering is a sub-discipline of electrical engineering"

So [elec] entirely contains [electr] in principal. In fairness, we have [chem] [biol] and [biochem], and they seem to be working well enough...

If our purpose is merely to disambiguate entries with multiple senses, then surely one big [elec] tag would do: "electricity and electronics". If our purpose is semantic tagging for its own sake (which I'm not sure if we're aiming at), then double-tagging [elec][electr] seems like it should be fine.

A "step-down transformer", for example, is a fundamental concept in electricity distribution. But it's also a box you order from amazon so you can plug a Japanese appliance into an American power outlet. It is literally "a piece of electronics". Resistors and capacitors have a similar overlap. Probably the same for "multimeter".

I could agree that a "wafer", "mask", logic circuits, etc. are part of the domain [electr] and don't need double-entry. i.e. if the [electr] item has essentially no use outside of the "electronics" subdomain of electrical engineering, then it's [electr]. Broad concepts like "voltage" or "impedance", and concepts not needed within [electr] remain [elec]. Otherwise, double-enter.

====== Some random notes:

We tagged "multimeter" as [electr]. But collins seems fairly decided on [elec]: https://www.collinsdictionary.com/dictionary/english/multimeter in American English: tagged "electricity" in Electrical Engineering: ...

We don't yet tag resistors, capacitors, but this looks like an instance of [electr] tagged as [elec] to me: https://www.collinsdictionary.com/dictionary/english/resistor (Electricity): a component with a specific resistance, used to control the current in a circuit

https://en.wikipedia.org/wiki/Applications_of_capacitors Capacitors have many uses in electronic and electrical systems.

briankrznarich commented 3 months ago

Quite by coincidence, I just looked up 仮定法... 仮定法[gramm] subjunctive mood​ 仮定法過去[ling] subjunctive past​ 仮定法過去完了[ling] subjunctive past perfect​

It seems like we have a bit of this going on with [ling] and [gramm], though those are much larger categories. It looks like some [ling] terms should probably be shuffled over (sporadic parts-of-speech terms, in particular).

This is another case where I'd say [ling] is a proper superset of [gramm], so it's an interesting comparison...

JMdictProject commented 3 months ago

Some more information on GG5's tagging things to do with electrons. The front-matter of the dictionary has a page with a table of the "分野名" being used (it's on the page after xii). Most are single kanji of the 【電】 variety, although there are a couple with two kanji, e.g. 【海保】. There is only one relating to things electrical: 【電】, which it says covers "電気(工学)". The 【電】 tag occurs 884 times in the dictionary.

Curiously, there are other tags being used which are not included in the table. One is 【電子工学】which Robin mentioned (301 times) and 【電算】 (2251 times!). Also, there are quite a few electricity/electronics-related entries which have no tag at all. An example is "でんきょく 【電極】an electrode; a pole; a [an electric] terminal."

briankrznarich commented 3 months ago

Is there some published guidance on the jmdict project's goals with the field/domain tags? (and/or, could you fill me in a bit?).

It may be that the whole raison d'être of this post is moot, in which case I don't object to just closing it.

Online dictionaries let you do something fun that you can't do in paper dictionaries. You can search by field/domain. "Show me the engineering terms". After looking at our field tags for a while, I've started to think of them in this light, where "omitting" terms from an appropriate domain seems like an oversight that is worth correcting for its own sake. But search-vocab-by-domain is surely not a common use case.

If the purpose of domain-tags is just "provide the best single-term for clarifying meaning", like a traditional paper dictionary does, then I guess it's not too critical which candidate is chosen among ambiguous options. Being in [ling] when it would be 'better' in [gramm] doesn't do much harm to someone looking up a term. Same for math/geom, eng/mech/civeng, psy/psych/psyanal. I realize now the elec/electr is hardly alone here.

Does "wafer" look better as "electronics" than "electrical engineering"? Sure. Could the domain be "electricity and electronics"? I think that would also be fine. I just seemed off that "multimeter/transistor/resistor" could somehow be "not an electrical engineering term" or "not an electronics term". But like I said, that may just not be relevant to the aims of this project.

I've made a few tangential remarks on #83 , which I just discovered.

JMdictProject commented 3 months ago

Field or domain tags (the lexicography texts use both) tend to be used to provide a context for an entry or for one or more of its senses. They can be useful in avoiding over-long glosses. If the クロス積 (cross product) entry didn't have a [math] tag it would either be quite opaque or in need of considerable expansion.

Apparently there has been a bit of a trend to reduce the use of field/domain tags, particularly in monolingual dictionaries. The Webster NID dropped many when it moved to its 3rd edition. They tend to be used more in bilingual dictionaries.

It seems best to stay with [elec]/[electr] and [ling]/[gramm]. At the margin, there'll often be overlap, but as long as the general context is established, we can tweak them as needed.

briankrznarich commented 3 months ago

Thank you, I appreciate the explanation. If we're following this trend even a bit, that explains why "triangle" doesn't need to be marked [geom] at least (something I used to wonder about).

On the subject of tagging, I guess I'd probably have some opinions if I was picking tags from scratch or something, but there's little reason for me to be worked up over the status quo here. I'm glad to have a better idea going forward of when/why tags should be selected.

JMdictProject commented 2 months ago

Thanks. I think this can be closed.