Open todeslord opened 10 years ago
I have done a workaround. I introduced a preprocessing function that substitutes <br> with <br />. Not really elegant, but it works. We have to keep in mind that there are possibly more singletags in the future causing trouble.
https://github.com/101companies/101repo/commit/19dd1d83fa8bb8127a50892012356849bf0b0e0e
HTML fact extractor does not support not marked single tags like
. The used parser cannot distinguish start tags from single tags. (The used SAX-Parser is not supporting single-tags cor- rectly. A <br> is leading to a wrong fragment file whereas <br/> is)
Possible solutions: -Find another parser -write a parser that finds single tags -use a preprocessor that converts single tags to the <br/> style. ...
Issue from the Fact Extraction paper of June 22.