Closed infinite-dao closed 1 year ago
Thanks @infinite-dao for the observations. First, some background on the command-line behaviour. It does both parsing and "cleaning" for each of the successfully recognized entities. See https://github.com/bionomia/dwc_agent/blob/master/bin/dwcagent. And so, perhaps these can be separated out for more granularity; the "clean" method is primarily a suite of logic statements that attempts to interpret occasional mishaps in the upstream parse method. As you've noted, the resultant output can be an empty JSON array when you may have expectations that at least something is returned. Here's one such example of the sort of "clean" logic at play: https://github.com/bionomia/dwc_agent/blob/master/lib/dwc_agent/cleaner.rb#L46.
dwcagent "ABR"
is a tough one.
The dependent, Namae gem produces: [#<Name given="ABR">]
whereas some of the additional regex in the present dwc_agent gem removes it all. The rationale here was because of the numerous instances of collection codes that wind-up in dwc:recordedBy
or dwc:identifiedBy
.
dwcagent "A. Cano,E."
is poor behaviour because the Namae gem produces, [#<Name family="A. Cano" given="E.">]
as is likely expected here as a compound family name. So, I'll try to tidy this one and write a test for it.
The Namae parser itself is based off a compiled LALR parser. I can't imagine there's much opportunity here to state ambiguity in input, which I'm guessing should be presented as an output with options such as, "could be this, or could be that" with particular scores of certainty/uncertainty.
Hej-hej,
I use https://libraries.io/rubygems/dwc_agent release 3.0.5.0 and found unexpected parsing, but these might be ambiguous name cases where even a human would not know what to do ;-) … similar perhaps to issue #15 (with no given input name separation at all…)
This case is also ambiguous:
E.
meant to be a name or is “misplaced”?E. A. Cano
, because I associate the…,E.
to the previous as there is no space after the comma, if there would be a space it could be a separate nameOr does the tool need an additional field
ambiguous_input
or similar to signal, that the program has judged it already for reason and not being a silent failing of parsing?Greetings Andreas