biosemantics / etc-site-archived-do-not-use

Source code for the ETC Toolkit web application
http://etc.cs.umb.edu/etcsite/
1 stars 0 forks source link

[Matrix Generation] Content between "&lt" and "&gt" in description are missing in matrix generation #611

Closed CathyYujie closed 7 years ago

CathyYujie commented 7 years ago

in the description resource: image

in the matrix generation system: image

Problem: the content between "&lt" and "&gt" are missing.


the whole description:

<?xml version="1.0" encoding="UTF-8"?>

weakley 2016 weakleys key Thursday, November 17, 2016 key-gatherer R._pubescens Leaves 3-9-foliolate (reduced simple leaves may also be present in the inflorescence). Upright stems herbaceous, annual, not differentiated into primocanes and floricanes, unarmed or with a few weak bristles; stipules oblanceolate; [e. WV northward]; [subgenus Cylactis – dwarf raspberries] Herbs or subshrubs (if woody at base, then < 3 dm tall). Leaves compound (at least the lower and better developed). Leaves 1-compound, either simply pinnately compound or simply palmately compound,. Principal (basal-most) leaves palmately compound, with 3-7 (-9) leaflets. Principal leaves distinctly petiolate, the petiole often longer than the leaflets, 3-7 (-9)-foliolate; fruit of achenes; leaves basal and cauline. Principal leaves 3-foliolate. Plants in flower. Petals white (or slightly pinkish). Calyx lobes not subtended by bractlets; [tribe Rubeae]. Herbs or subshrubs (if woody at base, then < 3 dm tall). Leaves compound (at least the lower and better developed). Leaves 1-compound, either simply pinnately compound or simply palmately compound,. Principal (basal-most) leaves palmately compound, with 3-7 (-9) leaflets. Principal leaves distinctly petiolate, the petiole often longer than the leaflets, 3-7 (-9)-foliolate; fruit of achenes; leaves basal and cauline. Principal leaves 3-foliolate. Plants in fruit (or sterile). Leaflets evenly serrate or crenate, each well-developed leaflet with > 7 teeth. Calyx lobes not subtended by bractlets. Fruit an aggregate of fleshy, adherent drupelets; leaflets acuminate at apex; [tribe Rubeae]
rodenhausen commented 7 years ago

This problem stems from sentence extraction in the perl portion of charaparser. The sentences read at the location below from the table populated by the perl portion lacks the part of the sentence between the < and > part of the sentence.

rodenhausen commented 7 years ago

Problem does not appear in the perl portion in a windows environment. Needs debugging in linux environment.

rodenhausen commented 7 years ago

Problem also does not appear debugging in my linux environment. On the server however it can be reproduced.

rodenhausen commented 7 years ago

tested that perl and mysql version do not play a role.

rodenhausen commented 7 years ago

fix: https://github.com/biosemantics/charaparser/commit/d29abd1eb0559bfe050ccc4aed648cb90160caba