Parser hangs on some inputs

clulab / reach

Reach Biomedical Information Extraction

Other

97 stars 39 forks source link

I'm running the parser locally on a text string as: ApiRuler.annotateText(text_string, 'fries'). This hangs on some inputs, while memory usage is stable and CPU usage is stably high.

An example input is the last 3 sentences of the PubMed 25338567 abstract:

CCK-8 test showed that the proliferation level of PBMNC gradually increased along with the concentration of HL-60 cells treated with MG132 and reached its peak when the concentration of the HL-60 cells was 1×10(5) (P < 0.01). No remarkable proliferation of PBMNC was observed in the K562 groups no matter if the HL-60 cells had been treated with MG132. It is concluded that the high concentration of MG132 can directly kill HL-60 cells, low-concentration of MG132 can induce the expression of costimulatory molecule CD86 in HL-60 cells, also can improve the proliferation of PBMNC.

Importantly, parsing each sentence independently goes through without hanging.

Some statistics on my desktop (32GB RAM, Debian): %MEM: 12, VIRT: 9973m, RES: 3.6g, %CPU: 106.4

This appears to be an issue local to Ben's python-java setup. This text is parsed and returns results very quickly using any of the following: 1) the ReachShell: sbt 'run-main edu.arizona.sista.reach.ReachShell' 2) the Reach BioVisualizer web app: http://agathon.sista.arizona.edu:8080/odinweb/bio 3) the Reach API via Curl from the command line: curl -XPOST -F 'text=<bg-input' 'http://agathon:8080/odinweb/api/text' > output-bg.json where the input text is in the file 'bg-input' and, 4) the Scala example TextInJsonOut from the reach-examples GitHub repository: sbt 'run-main com.yourorg.TextInJsonOut outfile-bg.json' < bg-input

Note that all of these programs run with at least 6G of memory, either allocated by the sbt build-file (javaOptions += "-Xmx6G") or as Java flags in the environment (JAVA_OPTS='-server -Xms1024m -Xmx6144m').

clulab / reach

Parser hangs on some inputs #56