bigdata-ustc / EduNLP

A library for advanced Natural Language Processing towards multi-modal educational items.
Apache License 2.0
51 stars 18 forks source link

[FEATURE] Update D2V, AutoTokenizer, and pretraining scripts #155

Closed KenelmQLH closed 8 months ago

KenelmQLH commented 8 months ago

Thanks for sending a pull request! Please make sure you click the link above to view the contribution guidelines, then fill out the blanks below.

Description

(Brief description on what this PR is about)

What does this implement/fix? Explain your changes.

...

Pull request type

Changes

  1. Update D2V: support for token vectors
  2. Add AutoTokenizer
  3. Update pretraining scripts for Disenq and QuesNet

Does this close any currently open issues?

N/A

Any relevant logs, error output, etc?

N/A

Checklist

Before you submit a pull request, please make sure you have to following:

Essentials

Comments

codecov-commenter commented 8 months ago

Codecov Report

Attention: Patch coverage is 93.27586% with 39 lines in your changes are missing coverage. Please review.

Project coverage is 97.31%. Comparing base (598d788) to head (84b79c7).

Files Patch % Lines
EduNLP/Pretrain/quesnet_vec.py 91.26% 11 Missing :warning:
EduNLP/Pretrain/disenqnet_vec.py 90.41% 7 Missing :warning:
EduNLP/I2V/i2v.py 71.42% 6 Missing :warning:
EduNLP/ModelZoo/hf_model/hf_model.py 96.07% 4 Missing :warning:
EduNLP/Vector/gensim_vec.py 82.35% 3 Missing :warning:
EduNLP/SIF/tokenization/formula/ast_token.py 86.66% 2 Missing :warning:
EduNLP/SIF/tokenization/tokenization.py 71.42% 2 Missing :warning:
EduNLP/ModelZoo/quesnet/quesnet.py 96.42% 1 Missing :warning:
EduNLP/Pretrain/elmo_vec.py 95.00% 1 Missing :warning:
EduNLP/Pretrain/hugginface_utils.py 90.00% 1 Missing :warning:
... and 1 more

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #155 +/- ## ========================================== - Coverage 97.81% 97.31% -0.51% ========================================== Files 80 84 +4 Lines 4349 4650 +301 ========================================== + Hits 4254 4525 +271 - Misses 95 125 +30 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

nnnyt commented 8 months ago

The test coverage seems to drop a lot. Try adding more tests for your new code