HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
407 stars 77 forks source link

fix pdf_path cannot be without a trailing slash #459

Closed senwu closed 4 years ago

senwu commented 4 years ago

Description of the problems or issues

Does your pull request fix any issue. Closes #442

Description of the proposed changes

Use the same method _get_linked_pdf_path to check the pdf file in is_linkable and link in VisualLinker.

Test plan

Test various doc_path and pdf_path formats.

Checklist

codecov-commenter commented 4 years ago

Codecov Report

Merging #459 into master will increase coverage by 0.03%. The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #459      +/-   ##
==========================================
+ Coverage   83.22%   83.25%   +0.03%     
==========================================
  Files          88       88              
  Lines        4559     4562       +3     
  Branches      837      836       -1     
==========================================
+ Hits         3794     3798       +4     
  Misses        572      572              
+ Partials      193      192       -1     
Flag Coverage Δ
#unittests 83.25% <93.75%> (+0.03%) :arrow_up:
Impacted Files Coverage Δ
src/fonduer/parser/visual_linker.py 84.18% <93.33%> (+0.22%) :arrow_up:
src/fonduer/parser/parser.py 92.94% <100.00%> (ø)
...c/fonduer/parser/preprocessors/doc_preprocessor.py 88.09% <0.00%> (+2.38%) :arrow_up: