CUNY-CL / udtube

Neural morphological analyzer
Apache License 2.0
3 stars 3 forks source link

Installable UDTube #35

Closed kylebgorman closed 2 months ago

kylebgorman commented 3 months ago

This monster PR makes UDTube a properly installable Python package and fixes a bunch of other nuisance issues.

  1. Closes #6.
  2. Closes #7.
  3. Closes #15.
  4. Closes #17.
  5. Closes #20.
  6. Closes #32.
  7. Closes #33.
  8. Closes #34.

It does not exactly close #3 but the indexes/label encoders are now all stored in the same file, which should make that easier to implement.

kylebgorman commented 2 months ago

This is now ready for review.

kylebgorman commented 2 months ago

I ran a series of evaluations using the same hyperparameters I give in the example configs. This is totally untuned, but the results are very promising, and I think exceed those reported in the thesis.

en, EWT:
    BERT:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    test_feats_accuracy    │    0.9625540375709534     │
│    test_lemma_accuracy    │    0.9711591601371765     │
│    test_upos_accuracy     │    0.9596463441848755     │
│    test_xpos_accuracy     │    0.9554420709609985     │
└───────────────────────────┴───────────────────────────┘

    MWEs removed, BERT:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    test_feats_accuracy    │    0.9624212980270386     │
│    test_lemma_accuracy    │     0.974416196346283     │
│    test_upos_accuracy     │    0.9590739011764526     │
│    test_xpos_accuracy     │    0.9553279876708984     │
└───────────────────────────┴───────────────────────────┘

ru, SynTagRus:
    UD features, mBERT:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    test_feats_accuracy    │    0.9279443621635437     │
│    test_lemma_accuracy    │    0.9742450714111328     │
│    test_upos_accuracy     │     0.98211270570755      │
└───────────────────────────┴───────────────────────────┘

    UD features, XLM-RoBERTa:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    test_feats_accuracy    │     0.936280369758606     │
│    test_lemma_accuracy    │    0.9777073264122009     │
│    test_upos_accuracy     │    0.9837583899497986     │
└───────────────────────────┴───────────────────────────┘

    UD features, RuBERT:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    test_feats_accuracy    │    0.9382235407829285     │
│    test_lemma_accuracy    │    0.9699346423149109     │
│    test_upos_accuracy     │     0.985562264919281     │
└───────────────────────────┴───────────────────────────┘
kylebgorman commented 2 months ago

Here goes nothing.