GlobalNamesArchitecture / gnparser

Split scientific names to meaningful elements with meta information
https://parser.globalnames.org/
MIT License
20 stars 2 forks source link

Problem with embedded double quote #439

Closed Mesibov closed 6 years ago

Mesibov commented 6 years ago

Yet another edge case...

$ cat test Calandrinia stagnensis J.M.Black Calandrinia strophiolata "(F.Muell.) Ewart Calandrinia tepperiana W.Fitzg.

$ gnparser file -i test Running with parallelism: 4 {"name_string_id":"338f0792-1bff-5dcf-8a33-2e19c86c33ff","parsed":true,"quality":1,"parser_version":"0.4.3","verbatim":"Calandrinia stagnensis J.M.Black","normalized":"Calandrinia stagnensis J. M. Black","canonical_name":{"value":"Calandrinia stagnensis","value_ranked":"Calandrinia stagnensis"},"hybrid":false,"surrogate":false,"virus":false,"bacteria":false,"details":[{"genus":{"value":"Calandrinia"},"specific_epithet":{"value":"stagnensis","authorship":{"value":"J. M. Black","basionym_authorship":{"authors":["J. M. Black"]}}}}],"positions":[["genus",0,11],["specific_epithet",12,22],["author_word",23,25],["author_word",25,27],["author_word",27,32]]} {"name_string_id":"be11ab03-8302-5a58-89d5-6eff13540c69","parsed":true,"quality":3,"quality_warnings":[[3,"Unparseable tail"]],"parser_version":"0.4.3","verbatim":"Calandrinia strophiolata \"(F.Muell.) Ewart","normalized":"Calandrinia strophiolata","canonical_name":{"value":"Calandrinia strophiolata","value_ranked":"Calandrinia strophiolata"},"hybrid":false,"surrogate":false,"unparsed_tail":" \"(F.Muell.) Ewart","virus":false,"bacteria":false,"details":[{"genus":{"value":"Calandrinia"},"specific_epithet":{"value":"strophiolata"}}],"positions":[["genus",0,11],["specific_epithet",12,24]]} {"name_string_id":"a6c17e20-4a96-5305-a385-cea5e52c1148","parsed":true,"quality":1,"parser_version":"0.4.3","verbatim":"Calandrinia tepperiana W.Fitzg.","normalized":"Calandrinia tepperiana W. Fitzg.","canonical_name":{"value":"Calandrinia tepperiana","value_ranked":"Calandrinia tepperiana"},"hybrid":false,"surrogate":false,"virus":false,"bacteria":false,"details":[{"genus":{"value":"Calandrinia"},"specific_epithet":{"value":"tepperiana","authorship":{"value":"W. Fitzg.","basionym_authorship":{"authors":["W. Fitzg."]}}}}],"positions":[["genus",0,11],["specific_epithet",12,22],["author_word",23,25],["author_word",25,31]]}

$ gnparser file -i test | tail -n +2 | while read line; do jshon -e canonical_name -e value <<< "$line" | tr -d '"'; done Calandrinia stagnensis json read error: line 1 column 191: '}' expected near '(' Calandrinia tepperiana

alexander-myltsev commented 6 years ago

The problem is not with the gnparser, but with your JSON parsing in the pipeline. This one should be easier. Please, check the https://github.com/GlobalNamesArchitecture/gnparser/issues/438