gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
38 stars 4 forks source link

repeated "infraspecies" tag in json output #185

Closed abubelinha closed 3 years ago

abubelinha commented 3 years ago

I am trying gnparser for the first time, so I might be misunderstanding the detailed output. I think there is a confusing json tag:

C:\gnparser-v1.3.3-win-64\gnparser -f pretty -d "Prosthechea cochleata (L.) W.E.Higgins var. grandiflora (Mutel) Christenson subvar. inventata Abu"

OUTPUT: Look inside the "details" part. There is a first "infraspecies" tag, which looks a bit strange to me. Shouldn't it be better called "supraspecies", "above-species" or something alike? Inside that one, there is another "infraspecies" tag which makes more sense to me (it contains a list of infraspecific ranks). But of course I see the problems in changing tags (many applications which depend on gnparser could stop working properly).

{
  "parsed": true,
  "quality": 1,
  "verbatim": "Prosthechea cochleata (L.) W.E.Higgins var. grandiflora (Mutel) Christenson subvar. inventata Abu",
  "normalized": "Prosthechea cochleata (L.) W. E. Higgins var. grandiflora (Mutel) Christenson subvar. inventata Abu",
  "canonical": {
    "stemmed": "Prosthechea cochleat grandiflor inuentat",
    "simple": "Prosthechea cochleata grandiflora inventata",
    "full": "Prosthechea cochleata var. grandiflora subvar. inventata"
  },
  "cardinality": 4,
  "authorship": {
    "verbatim": "Abu",
    "normalized": "Abu",
    "authors": [
      "Abu"
    ],
    "originalAuth": {
      "authors": [
        "Abu"
      ]
    }
  },
  "details": {
    "infraspecies": {
      "genus": "Prosthechea",
      "species": "cochleata",
      "authorship": {
        "verbatim": "(L.) W.E.Higgins",
        "normalized": "(L.) W. E. Higgins",
        "authors": [
          "L.",
          "W. E. Higgins"
        ],
        "originalAuth": {
          "authors": [
            "L."
          ]
        },
        "combinationAuth": {
          "authors": [
            "W. E. Higgins"
          ]
        }
      },
      "infraspecies": [
        {
          "value": "grandiflora",
          "rank": "var.",
          "authorship": {
            "verbatim": "(Mutel) Christenson",
            "normalized": "(Mutel) Christenson",
            "authors": [
              "Mutel",
              "Christenson"
            ],
            "originalAuth": {
              "authors": [
                "Mutel"
              ]
            },
            "combinationAuth": {
              "authors": [
                "Christenson"
              ]
            }
          }
        },
        {
          "value": "inventata",
          "rank": "subvar.",
          "authorship": {
            "verbatim": "Abu",
            "normalized": "Abu",
            "authors": [
              "Abu"
            ],
            "originalAuth": {
              "authors": [
                "Abu"
              ]
            }
          }
        }
      ]
    }
  },
  "words": [
    {
      "verbatim": "Prosthechea",
      "normalized": "Prosthechea",
      "wordType": "GENUS",
      "start": 0,
      "end": 11
    },
    {
      "verbatim": "cochleata",
      "normalized": "cochleata",
      "wordType": "SPECIES",
      "start": 12,
      "end": 21
    },
    {
      "verbatim": "L.",
      "normalized": "L.",
      "wordType": "AUTHOR_WORD",
      "start": 23,
      "end": 25
    },
    {
      "verbatim": "W.",
      "normalized": "W.",
      "wordType": "AUTHOR_WORD",
      "start": 27,
      "end": 29
    },
    {
      "verbatim": "E.",
      "normalized": "E.",
      "wordType": "AUTHOR_WORD",
      "start": 29,
      "end": 31
    },
    {
      "verbatim": "Higgins",
      "normalized": "Higgins",
      "wordType": "AUTHOR_WORD",
      "start": 31,
      "end": 38
    },
    {
      "verbatim": "var.",
      "normalized": "var.",
      "wordType": "RANK",
      "start": 39,
      "end": 43
    },
    {
      "verbatim": "grandiflora",
      "normalized": "grandiflora",
      "wordType": "INFRASPECIES",
      "start": 44,
      "end": 55
    },
    {
      "verbatim": "Mutel",
      "normalized": "Mutel",
      "wordType": "AUTHOR_WORD",
      "start": 57,
      "end": 62
    },
    {
      "verbatim": "Christenson",
      "normalized": "Christenson",
      "wordType": "AUTHOR_WORD",
      "start": 64,
      "end": 75
    },
    {
      "verbatim": "subvar.",
      "normalized": "subvar.",
      "wordType": "RANK",
      "start": 76,
      "end": 83
    },
    {
      "verbatim": "inventata",
      "normalized": "inventata",
      "wordType": "INFRASPECIES",
      "start": 84,
      "end": 93
    },
    {
      "verbatim": "Abu",
      "normalized": "Abu",
      "wordType": "AUTHOR_WORD",
      "start": 94,
      "end": 97
    }
  ],
  "id": "f83a0c92-1361-51f4-ab85-0e1069856b73",
  "parserVersion": "v1.3.3"
}
dimus commented 3 years ago

The first 'infraspecies' tag explains a "type" of a whole name, 'Aus' would be 'uninomial', 'Aus bus' would be 'species', 'Aus bus cus' would be 'infraspecies'.

The second tag is describing details for infraspecies section of a name

abubelinha commented 3 years ago

Ah OK, that makes sense.

Thanks a lot for the explanation.