clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

REACH taking long time on certain texts #426

Closed bgyori closed 7 years ago

bgyori commented 8 years ago

I'm not specifically interested in this abstract but it is one example on which REACH seems to take a very long time. I'm putting it in as an issue because looking at why it's so slow might expose some more general issue. If not, feel free to close this!

https://www.ncbi.nlm.nih.gov/pubmed/27551758

The effects of tanshinone IIA on hypoxia/reoxygenation-induced myocardial microvascular endothelial cell apoptosis in rats via the JAK2/STAT3 signaling pathway. This study aims to investigate the effects of tanshinone IIA on hypoxia/reoxygenation (H/R)-induced myocardial microvascular endothelial cell (MMEC) apoptosis in rats. MMECs from 10-days aged rats were isolated, cultured and identified, which were then divided into following groups: control group, control+tanshinone IIA (50\u03bcM) group, H/R model group, H/R+tanshinone IIA (5\u03bcM) group, H/R+tanshinone IIA (50\u03bcM) pre-treatment group, H/R+AG490 (50\u03bcM) pre-treatment group and H/R+AG490 (50\u03bcM)+tanshinone IIA (50\u03bcM) pre-treatment group. MTT assay, TUNEL staining and flow cytometry were used to measure the cellular viability and apoptosis. Western-blot were performed to detect protein expressions in JAK2/STAT3 signaling pathway. Compared with control group, H/R group showed decreased cell viability, increased apoptosis rate, increased proportions of cells into G0/G1 phase, decreased proportions of cells in S phase and G2/M phase, as well as up-regulated expressions of JAK2, STAT3, p53, Bax, Caspase-3, pJAK2 and pSTAT3, and down-regulated Bcl-2 expression (all P<0.05). Compared with H/R group, H/R+tanshinone IIA (5\u03bcM) group, H/R+tanshinone IIA (50\u03bcM) group H/R+AG490 (50\u03bcM) group and H/R+AG490 (50\u03bcM)+tanshinone IIA (50\u03bcM) group had increased cell viability, decreased apoptosis rate, reduced proportions of cells into G0/G1 phase, elevated proportions of cells in S phase and G2/M phase, as well as down-regulated expressions of JAK2, STAT3, p53, Bax, Caspase-3, pJAK2 and pSTAT3, elevated expression of Bcl-2 (all P<0.05). The most remarkable changes were observed in H/R+AG490 (50\u03bcM)+tanshinone IIA (50\u03bcM) group. Tanshinone IIA may attenuate H/R-induced MMEC apoptosis in rats by inhibiting the JAK2/STAT3 signaling pathway and regulating the expressions of p53, Bax, Caspase-3 and Bcl-2, which may provide a protective effect of tanshinone IIA for MMECs.
myedibleenso commented 8 years ago

@bgyori, thanks. This paper doesn't appear to be a part of the open access subset. Are you processing just the nxml for the abstract, or are you processing the full text? If the latter, what format are you using? If it is the raw text of the full paper and you're including the references, that may be one reason.

bgyori commented 8 years ago

Just the raw text of the abstract. When open access full text is not available, we usually read the abstract as raw text using the ApiRuler.annotateText API. One thing to note about the abstract is that it has a lot of unicode characters. Maybe that's relevant?

myedibleenso commented 8 years ago

@bgyori, when you get a chance, would you please attach a .txt file to this issue with the raw text that you sent to Reach?

bgyori commented 8 years ago

Here it is, the text in the file is UTF-8 encoded (I also pasted the raw text in my original question above). PMID27551758_abstract.txt

marcovzla commented 8 years ago

It seems that this specific sentence is the problem:

Compared with H/R group , H/R+ tanshinone IIA ( 5muM ) group , H/R+ tanshinone IIA ( 50muM ) group H/R+ AG490 ( 50muM ) group and H/R+ AG490 ( 50muM ) + tanshinone IIA ( 50muM ) group had increased cell viability , decreased apoptosis rate , reduced proportions of cells into G0/G1 phase , elevated proportions of cells in S phase and G2/M phase , as well as down-regulated expressions of JAK2 , STAT3 , p53 , Bax , Caspase-3 , pJAK2 and pSTAT3 , elevated expression of Bcl-2 ( all P < 0.05 ) .

myedibleenso commented 8 years ago

According to conversations in today's meeting, the problem here is supposedly not the speed of the parser. We'll look into this further. Thanks for narrowing it down, @marcovzla.

hickst commented 7 years ago

Note that issue #463 is probably another report of this problem.

myedibleenso commented 7 years ago

Profiling this now...