HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
409 stars 77 forks source link

Support v2.3.X of spaCy, which includes pretrained models for Chinese and Japanese #506

Closed HiromuHota closed 4 years ago

HiromuHota commented 4 years ago

Description of the problems or issues

Is your pull request related to a problem? Please describe.

spaCy (v2.3.X) includes pretrained models for Chinese and Japanese. I'd like to use those models in Fonduer.

Does your pull request fix any issue.

N/A.

Description of the proposed changes

Simply expand the compatible version of spaCy.

Test plan

A clear and concise description of how you test the new changes.

Changed existing tests to adopt to spaCy v2.3.X. As spaCy's models are statistical, some parts of output like part-of-speech tags and named entity change when a model changes, and they actually did. To become less dependent on unstable output, I changed assertion checks to just check some values other than "" are filled-in or not.

Instead of exact check on each value like below:

sentence.ner_tags == ["O", "O", "GPE"]

Now just to check ner_tags have some value other than "":

all(sentence.ner_tags)

Checklist

codecov-commenter commented 4 years ago

Codecov Report

Merging #506 into master will decrease coverage by 0.11%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #506      +/-   ##
==========================================
- Coverage   85.84%   85.72%   -0.12%     
==========================================
  Files          88       88              
  Lines        4585     4583       -2     
  Branches      855      856       +1     
==========================================
- Hits         3936     3929       -7     
- Misses        464      468       +4     
- Partials      185      186       +1     
Flag Coverage Δ
#unittests 85.72% <100.00%> (-0.12%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
setup.py 0.00% <ø> (ø)
src/fonduer/parser/lingual_parser/spacy_parser.py 80.36% <100.00%> (-1.46%) :arrow_down:
src/fonduer/parser/parser.py 92.08% <0.00%> (-0.95%) :arrow_down: