biosemantics / charaparser

enumerateCompoundOrgan
6 stars 2 forks source link

Basionyms and synonyms stripped from FNA treatments #34

Closed jocelynpender closed 6 years ago

jocelynpender commented 6 years ago

Coarse grained XML treatments correctly parse basionyms and synonyms as follows, for Cirsium arvense,

<taxon_identification status="BASIONYM">
    <taxon_name rank="genus" authority="unknown" date="unknown">Serratula</taxon_name>
    <taxon_name rank="species" authority="Linnaeus" date="unknown">arvensis</taxon_name>
    <place_of_publication>
      <publication_title>Sp. Pl.</publication_title>
      <place_in_publication>2: 820. 1753</place_in_publication>
    </place_of_publication>
    <taxon_hierarchy>genus Serratula;species arvensis;</taxon_hierarchy>
  </taxon_identification>
  <taxon_identification status="SYNONYM">
    <taxon_name rank="genus" authority="unknown" date="unknown">Breea</taxon_name>
    <taxon_name rank="species" authority="(Linnaeus) Lessing" date="unknown">arvensis</taxon_name>
    <taxon_hierarchy>genus Breea;species arvensis;</taxon_hierarchy>
  </taxon_identification>

etc. After parsing with Charaparser to obtain fine-grained parsed treatments, the taxon identification tags only show ACCEPTED names, with basionyms and synonyms nowhere to be found.