JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
18 stars 1 forks source link

JMdict NG (next generation) progress and release date #136

Open parfait8566 opened 2 months ago

parfait8566 commented 2 months ago

Previous two issues were closed. In #116 early summer was put on the table, but I guess it might have been delayed. Again, I really don't want to put any pressure and I'm just curious about the progress. One question I have is if it wouldn't be better to have some sort of "pitch accent box" in JMdictDB where the community can add or edit info in order to already cover a lot of entries when JMdict NG officially releases. Side note, I think that the NHK accent dictionary also includes the pitch accent of inflected verbs and adjectives. Will this be included in JMdict?

parfait8566 commented 5 days ago

Another question: pitch accent can change depending on POS. Sometimes JMdict has different POSs in the same sense. Would this be handled correctly?

yamagoya commented 4 days ago

Sorry, I missed the Aug 31 comment.

Regarding progress, I ran into some health issues this year that have got in the way. Am trying to work around those.

Will this [NHK accent dictionary] be included in JMdict? Not by me although if someone wants to extract and reformat the info there for inclusion in the database, that would be fine with me. A look will probably need to be taken at potential copyright issues though.

Would this [pitch accent dependencies on POS] be handled correctly? Currently no, there is no provision for different pitch accents for different parts-of-speech.

stephenmk commented 3 days ago

A look will probably need to be taken at potential copyright issues though.

I'm not a lawyer, but my understanding is that sweat of the brow copyright protections don't exist in Australia, the US, and the UK. So large collections of facts such as telephone numbers, television broadcast schedules, or pitch accent locations wouldn't be copyrightable even if the collection of the data required substantial original research and labor. No idea what the legal situation is in Japan, though.

JMdictProject commented 3 hours ago

I think the NG is quite a way off. Once the revised database is ready, there will be a complex task of changing over to it as well as keeping all the legacy systems working.

We haven't considered how the JMdictDB interface will look, but I do not favour introducing more boxes. The JEL language approach used for the information fields should be able to handle additional information such as pitch accents associated with readings. As for including information such as pitch accent from various sources, I expect that eventually some form of bulk update may be possible. Of course, the NHK ones are a possible source.

Re pitch accent and POS, we'll have to consider that issue eventually. It won't be simple.