HazyResearch / pdftotree

:evergreen_tree: A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
MIT License
428 stars 90 forks source link

Markup <table/> and <td/> #84

Closed HiromuHota closed 3 years ago

HiromuHota commented 3 years ago

Description of the problems or issues

Is your pull request related to a problem? Please describe.

Currently, <table/> and <td/> are not marked up with hOCR attributes.

Does your pull request fix any issue.

N/A

Description of the proposed changes

This PR adds

Test plan

Checklist

codecov-io commented 3 years ago

Codecov Report

Merging #84 into master will increase coverage by 0.23%. The diff coverage is 79.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
+ Coverage   65.62%   65.86%   +0.23%     
==========================================
  Files          21       21              
  Lines        2508     2525      +17     
==========================================
+ Hits         1646     1663      +17     
  Misses        862      862              
Flag Coverage Δ
#unittests 65.86% <79.59%> (+0.23%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pdftotree/ml/TableExtractML.py 0.00% <0.00%> (ø)
pdftotree/utils/pdf/render.py 0.00% <0.00%> (ø)
pdftotree/utils/pdf/pdf_parsers.py 92.38% <14.28%> (-0.04%) :arrow_down:
pdftotree/utils/pdf/grid.py 20.90% <33.33%> (+0.90%) :arrow_up:
tests/test_basic.py 95.52% <88.46%> (+0.52%) :arrow_up:
pdftotree/TreeExtract.py 88.85% <95.65%> (+0.23%) :arrow_up:
pdftotree/core.py 100.00% <100.00%> (ø)
pdftotree/ml/features.py 63.82% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fbe6a1a...2b68c4b. Read the comment docs.