dhlab-epfl / dhSegment

Generic framework for historical document processing
https://dhlab-epfl.github.com/dhSegment
GNU General Public License v3.0
370 stars 116 forks source link

Read Tag Values from XML file #57

Closed sanakhamekhem closed 4 years ago

sanakhamekhem commented 4 years ago

Dear Sir,

I would like to get coordinates of text lines of a document from xml file. I'm using the function: tl_coords=PAGE.get_unique_tags_from_xml_text_regions(xml_filename= str, tag_pattern= 'type:Page/TextRegion/TextLine/Coords points')

But, the output is null. I'm asking if you can specify how the tag pattern should be in input. Thank you in advance.

solivr commented 4 years ago

Hello,

With the function PAGE.get_unique_tags_from_xml_text_regions you should have a custom filed in your xml object. Then you can pass any pattern you'd like to the tag_pattern parameter ('{type:.*;}' by default).

For instance if your xml object looks like this:
<TextRegion id="0001" custom="structure {type:point;}">
calling PAGE.get_unique_tags_from_xml_text_regions(xml_filename=filename) should return ['point'].