gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
38 stars 4 forks source link

Cultivar words are not split #227

Open dimus opened 2 years ago

dimus commented 2 years ago

Word section provides one word per record for everything but cultivar plant identifiers. For example:

Achillea 'Cerise Queen'

The words section provides Ahillea as genus word, and Cerise Queen as cultivar word. To be consistent and to correspond to the name of a section I think we should return Cerise, Queen as cultivar words.

{
  "parsed": true,
  "quality": 1,
  "verbatim": "Achillea 'Cerise Queen'",
  "normalized": "Achillea ‘Cerise Queen’",
  "canonical": {
    "stemmed": "Achillea ‘Cerise Queen’",
    "simple": "Achillea ‘Cerise Queen’",
    "full": "Achillea ‘Cerise Queen’"
  },
  "cardinality": 2,
  "details": {
    "uninomial": {
      "uninomial": "Achillea",
      "cultivar": "‘Cerise Queen’"
    }
  },
  "words": [
    {
      "verbatim": "Achillea",
      "normalized": "Achillea",
      "wordType": "UNINOMIAL",
      "start": 0,
      "end": 8
    },
    {
      "verbatim": "Cerise Queen",
      "normalized": "‘Cerise Queen’",
      "wordType": "CULTIVAR",
      "start": 10,
      "end": 22
    }
  ],
  "id": "d77d065e-8732-5dad-86b0-5b1e62b51525",
  "parserVersion": "v1.6.5"
}