SpeciesFileGroup / taxonworks

Workbench for biodiversity informatics.
http://taxonworks.org
Other
87 stars 26 forks source link

[Bug]: Unable to put initials only for identifiedBy when using DwC-A import #4063

Open creplog opened 3 weeks ago

creplog commented 3 weeks ago

Steps to reproduce the bug

1. Have .csv file containing Collection objects, including the field identifiedBy 
2. For a given collection object, populate identifiedBy with only person's initials (example: P., A.V.). The collection object's determiner only put their initials on the label (screenshot 1)
3. Use the DwC-A import workbench and upload .csv file 
4. import failed for collection object with initials only for identifiedBy. Status: Error last_name can't be blank (screenshot of bench with error included)
5. I tried to edit and rewrite the initials for the collection object within the created import workbench UI, record still errors
6. I can create a Person record with only initials (shown in last screenshot), but when I try to connect it to a record, it will not save (I saved the record after this screenshot, and then reloaded the page, and the determiner P., A.V. I added was not present)
...

Screenshot

image image image image image image

Expected behavior

No response

Additional Screenshots

No response

Environment

Production

Sandbox Used

No response

Version

v0.44.0

Browser Used

firefox

LocoDelAssembly commented 3 weeks ago

Looks the name is parsable as-is:

3.3.4 :001 > DwcAgent.parse("P., A.V.")
 => [#<struct Namae::Name family="P.", given="A.V.", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>] 
3.3.4 :002 > DwcAgent.parse("D., C.J.")
 => [#<struct Namae::Name family="D.", given="C.J.", suffix=nil, particle=nil, dropping_particle=nil, nick=nil, appellation=nil, title=nil>] 
3.3.4 :003 > 

Maybe recordedBy is the problem? Can you share the entire contents of the offending dataset row?

LocoDelAssembly commented 3 weeks ago

Completely unparsable text result is field be interpreted as blank. Not sure if we wan to change that at the expense of more frequent errored records?

I couldn't reproduce the problem with P., A.V. nor D., C.J., it imports fine for me in local env.

LocoDelAssembly commented 3 weeks ago

Sorry, I can actually reproduce the problem!

The importer after parsing the name it also cleans it with this third-party code: https://github.com/bionomia/dwc_agent/blob/6c87e49ff877afdf9fddffd21c0794e9acec719c/lib/dwc_agent/cleaner.rb#L27


    # Cleans the passed-in namae object from the parse method and
    # re-organizes it to better match expected Darwin Core output.
    #
    # @param parsed_namae [Namae::Name] a Namae object
    # @return Namae::Name [Object] a new Namae object

I don't feel confident just removing this cleaner (@mjy?). The minimum requirement for parsed names is that the family name be complete, given names can be just initials.

mjy commented 3 weeks ago

@LocoDelAssembly Right, there is much more benefit to keep dwc_agent in the loop. The real long-term solution is to include a name field for Person and use that when we can't reconstruct a parsing.

dshorthouse commented 2 weeks ago

Should it help, a newer version of dwc_agent has a utility method you might use to check if a parsed string once cleaned produces all nil attributes:

if cleaned_name != DwcAgent.default
   # do something otherwise store the unparsed input
end