JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
16 stars 1 forks source link

Two elements with strange pos and info classification #104

Closed udoprog closed 8 months ago

udoprog commented 8 months ago

Hi,

Sorry in advance since I'm not very well versed with data policy on JMDict. If something below is wrong, please advise!

I'm writing a machine conjugator based on JMdict. To improve coverage I've started reporting what seems to be classification mismatches, or words which have a pos that seems to contradict their structure.

This is the pseudocode for my inclusion criteria for words which can be conjugated:

for reading in reading_elements:
    if reading.is_search_only(): # sK tag
        continue;

    if reading.is_no_kanji || kanji_elements.is_empty():
        yield (None, reading)

    for kanji in kanji_elements:
        if kanji.is_search_only():
            continue;

        if reading.applies_to(kanji): # checks if reading string applies
            yield (Some(kanji), reading) # yields kanji with associated reading

With this, there's exactly two elements across verbs and adjectives which as far as I can tell seem to be misclassified:

id kanji pos description
2222710 弱っちぃ adj-i Should this kanji element be marked as search only sK?
2858769 買い増す v1, vt This is marked as an ichidan verb.

Thank you

JMdictProject commented 8 months ago

Thanks for this bit of feedback.

As these anomalies involve individual entries, it would probably be best to raise them as comments in the entries themselves rather than use this (general) issue forum (https://www.edrdg.org/jmwsgi/entr.py?svc=jmdict&sid=&q=2858769)

I'll close this issue.