clulab / reach

Reach Biomedical Information Extraction
Other
96 stars 39 forks source link

Reach crashes on input of large nxml files with a stack overflow #792

Open kwalcock opened 1 year ago

kwalcock commented 1 year ago

I believe that it comes down to a sequence of intervals being unioned using a reduceRight, which is not tail recursive. Changing to reduceLeft seems to fix the problem. Unfortunately, multiple dependencies are involved.

The problem can last be detected in this project at

https://github.com/clulab/reach/blob/c7397a4b979454854f0c4a39098d2c2bb31363a3/main/src/main/scala/org/clulab/reach/PaperReader.scala#L126

which leads to the nxmlreader project

https://github.com/lum-ai/nxmlreader/blob/ef7e1440faf5dcae54cc046ba26825b08f1c84e1/src/main/scala/ai/lum/nxmlreader/standoff/Tree.scala#L57

and then to common

https://github.com/lum-ai/common/blob/b7c0b70c460790088d655a98be178cbef9767a24/src/main/scala/ai/lum/common/Interval.scala#L399