biosemantics / charaparser

6 stars 2 forks source link

Perl reading sentence part contains bugs #30

Closed hongcui closed 7 years ago

hongcui commented 7 years ago

Watch for &lt; and &gt; The < and > in the text are orginally &lt; and &gt;. Make sure they are treated properly.

Shrubs, subshrubs, or herbs, perennial, 0.5–30(–50) dm; fibrous, ± woody in species with larger plants. Stems 1–several, <biennial or perennial, rarely annual (R. illecebrosus)>, erect, arching, mounding, or creeping, rarely decumbent, ascending, or scrambling, <rooting or not at nodes or tips, terete or angled>; <prickles absent or sparse to dense, erect to retrorse, weak to stout, broad based or not; bristles absent or sparse to dense, erect to slightly retrorse, weak to stiff>; glabrous or hairy, eglandular or stipitate-glandular, sometimes sessile-glandular, <pruinose or not>. Leaves winter-persistent to deciduous, cauline; stipules filiform or elliptic to ovate, margins entire; petiole present; blade reniform to orbiculate, 2–30 cm, herbaceous to ± coriaceous, leaflets 0 or 3, 5, 7, or 9, terminal ovate to elliptic to obovate, <1.7–15 cm, base cuneate to rounded or cordate, sometimes truncate, rarely tapered or obtuse, unlobed or lobed>, margins flat or revolute, finely to coarsely crenate, dentate to doubly dentate, or serrate to doubly serrate, abaxial surface <unarmed or with prickles on midvein consistent with those on stems>, glabrous or ± densely hairy, eglandular or ± densely stipitate-glandular, sometimes sessile-glandular, along veins. Inflorescences axillary or terminal, 1–35(–100)-flowered, cymiform, racemiform, umbelliform, thyrsiform, or paniculiform, glabrous or sparsely to densely pubescent, <eglandular or sparsely to densely glandular, armed or unarmed>; bracts usually present; bracteoles absent. Pedicels present, <unarmed or sparsely armed with prickles similar to those of stems, glabrous or sparsely to densely hairy, eglandular or sparsely to densely stipitate-glandular, sometimes sessile-glandular>. Flowers bisexual (unisexual in R. chamaemorus, R. ursinus, and subg. Micranthobatus [in the sense of Kalkman]), 5–80 mm diam.; hypanthium 3–10 mm diam., glabrous or sparsely to densely pubescent, <eglandular or sparsely to densely glandular>; sepals 5, erect or spreading to reflexed, lanceolate to long-caudate, <unarmed or armed, glabrous or hairy, eglandular or sparsely to densely stipitate-glandular, sometimes sessile-glandular>; petals (0–)5(or 6), white to pink or magenta, suborbiculate to elliptic, obovate, or spatulate; stamens 20–100+, shorter to longer than petals, <filaments filiform or laminar>; carpels glabrous or hairy, <styles slender or clavate>. Fruits aggregated drupelets, (1–)5–100[–150], <not or weakly to strongly coherent>, separating with or without torus attached, golden yellow to red or black, globose to hemispheric or cylindric, 5–20 mm, <fleshy or dryish>, glabrous or finely hairy, <sometimes pruinose>; hypanthium usually persistent; sepals usually persistent, usually reflexed. Seeds 1 per drupelet. x = 7.

hongcui commented 7 years ago

It's not a bug in perl. The input is not in utf-8. Sentences extracted by perl were fine. ETC term review context not showing the sentences correctly but it is an etc-site issue.