dhlab-epfl / dhSegment

Generic framework for historical document processing
https://dhlab-epfl.github.com/dhSegment
GNU General Public License v3.0
370 stars 116 forks source link

Feature/table cells #53

Closed CrazyCrud closed 4 years ago

CrazyCrud commented 4 years ago

The PAGE-XML functionality has been extented in order to be able to create TableCell elements. Also an error was fixed which occured when trying to transform a list to points.

solivr commented 4 years ago

Hi @CrazyCrud,

Thanks a lot for your PR. I've added some formatting to the init functions of the classes and also changed the formatting of the strings to be compatible with python 3.5.

Before merging, could you check that the TableCell element has the proper attributes names (with respect to this schema: https://www.primaresearch.org/schema/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd)? Also according to this schema "header" attribute seems to be missing.

CrazyCrud commented 4 years ago

Hey @solivr,

No problem, always happy to help.

Thank you for your notes. I tried to correct the attributes and hope it corresponds to the PAGE schema.

solivr commented 4 years ago

Thanks, I've added your changes to the master branch.