clulab / eidos

Machine reading system for World Modelers
Apache License 2.0
36 stars 24 forks source link

filter table sentences #1034

Closed BeckySharp closed 3 years ago

BeckySharp commented 3 years ago

This filters sentences that contain >= 10 numbers or have >5 single-character words. Using travis to check the tests...

closes #1022

kwalcock commented 3 years ago

It sounds like a good idea in general, but isn't it also skipping a sentence because the code that would run on it will crash? Will there just be some other sentence that breaks it? It will be great to be able to finish this document, though. I think that others have processed sentences in different threads and given them time limits in case they run into problems (and/or added exception handling!). This text might have come from the vertically printed labels on a graph rather than a table. The PDF to text converter had difficulty figuring out whether the glyphs connected. Maybe it should be isCryptic or isFubar or isNaS, not a sentence?

BeckySharp commented 3 years ago

@kwalcock agreed on all points, but i care more about the latter. If we run into issues with this in the future with a real sentence, I'd rather debug with that one.

I can rename the method