GateNLP / gate-core

The GATE Embedded core API and GATE Developer application
GNU Lesser General Public License v3.0
78 stars 29 forks source link

Documents are not processing when document has the inner tables or lots of sapce #28

Closed ganeshkaspate closed 6 years ago

ganeshkaspate commented 6 years ago

Hi, I am using Gate Developer. Here, I have documents or consider it as a resumes in which there are inner tables in that resume. And some are with a huge space between texts so, when I try to process this type of resumes then I am not able to process this documents. Sometimes, because of this documents it throes out of memory exception as well. I do have some JAPE rules. Is it because of the Rules ? Thanks

greenwoodma commented 6 years ago

My guess would be that the sentence splitter might be the problem here as I've had similar problems with HTML tables in the past. Assuming you are using the default sentence splitter could you try the regex splitter instead and see if that makes a difference or not. If it doesn't is there any way you can share a document that causes the problem so we can investigate further?

greenwoodma commented 6 years ago

Closing this as there has been no follow up making it impossible to investigate further