Symbiota / Symbiota-deprecated

This original code fork is considered deprecated and no longer maintained by the community. We recommend that you use one of the several actively developed forks.
GNU General Public License v2.0
23 stars 93 forks source link

extra backslash "\" added in escaped identification remarks #130

Open jhpoelen opened 4 years ago

jhpoelen commented 4 years ago

@neilcobb - please note that record with occurrence id 26225229 from https://scan-bugs.org:443/portal/content/dwca/MCZ_DwC-A.zip appears to have an invalid occurrence record at https://scan-bugs.org:443/portal/collections/individual/index.php?occid=26225229 :

id,institutionCode,collectionCode,ownerInstitutionCode,collectionID,basisOfRecord,occurrenceID,catalogNumber,otherCatalogNumbers,kingdom,phylum,class,order,family,scientificName,taxonID,scientificNameAuthorship,genus,specificEpithet,taxonRank,infraspecificEpithet,identifiedBy,dateIdentified,identificationReferences,identificationRemarks,taxonRemarks,identificationQualifier,typeStatus,recordedBy,recordNumber,eventDate,year,month,day,startDayOfYear,endDayOfYear,verbatimEventDate,occurrenceRemarks,habitat,fieldNumber,informationWithheld,dataGeneralizations,dynamicProperties,associatedTaxa,reproductiveCondition,establishmentMeans,lifeStage,sex,individualCount,samplingProtocol,samplingEffort,preparations,country,stateProvince,county,municipality,locality,locationRemarks,decimalLatitude,decimalLongitude,geodeticDatum,coordinateUncertaintyInMeters,verbatimCoordinates,georeferencedBy,georeferenceProtocol,georeferenceSources,georeferenceVerificationStatus,georeferenceRemarks,minimumElevationInMeters,maximumElevationInMeters,minimumDepthInMeters,maximumDepthInMeters,verbatimDepth,verbatimElevation,disposition,language,recordEnteredBy,modified,rights,rightsHolder,accessRights,recordId,references
26225229,MCZ,Ent,"Museum of Comparative Zoology, H",029816b2-ba46-4c89-9ebb-c1c630a0ce7e,PreservedSpecimen,MCZ:Ent:36086,36086,"type number=36086",Animalia,Arthropoda,Insecta,Coleoptera,Curculionidae,"Anametis granulatus",,"(Say, 1832)",,,,,"Rachel L. Hawkins","2017-03-30 00:00:00.0",,"Labels: ""granulat-/ tus, S \\"; ""T. Say/ Type""; ""Anametis/ grisea/ H""; ""MCZ SYNTYPE/ 36086/ R. L. Hawkins/ 2017.iii.07""",,,"Syntype of Barynotus granulatus","[no agent data]",,1700-01-01,1700,1,1,1,,"[no verbatim date data]","collection: Thomas Say Collection; life stage: adult",,,,,"{""collection"":""Thomas Say Collection"", ""life stage"":""adult""}",,,,,,1,,,"whole animal (pinned)","United States",Indiana,,,"[no specific locality data]",,,,,,,,,,,,,,,,,,"not applicable",en,,"2017-03-30 10:09:26",http://creativecommons.org/licenses/by-nc/4.0/,"President and Fellows of Harvard College","The publisher and rights holder of this work is Museum of Comparative Zoology, Harvard University. Copyright © 2018 President and Fellows of Harvard College, Some Rights Reserved. This work is licensed under a Creative Commons Attribution Non Commercial (CC-BY-NC) 4.0 License.",urn:uuid:929ab79e-0e24-44e7-92a7-53bb1f826fe5,https://scan-bugs.org:443/portal/collections/individual/index.php?occid=26225229

Does Symbiota do any validation on escaped field values ?

jhpoelen commented 4 years ago

@neilcobb please note that at https://scan-bugs.org/portal/collections/individual/index.php?occid=26225229 , the "correct" identification remarks are shown:

Labels: "granulat-/ tus, S \"; "T. Say/ Type"; "Anametis/ grisea/ H"; "MCZ SYNTYPE/ 36086/ R. L. Hawkins/ 2017.iii.07"

It appears that Symbiota adds an extra backslash.

originally filed at https://github.com/gbif/dwca-io/issues/48 .

jhpoelen commented 4 years ago

For completeness, the offending \\" malformed field value comes from the occurrenceRemarks of https://scan-bugs.org/portal/collections/individual/index.php?occid=26225229:

"Labels: ""granulat-/ tus, S \\"; ""T. Say/ Type""; ""Anametis/ grisea/ H""; ""MCZ SYNTYPE/ 36086/ R. L. Hawkins/ 2017.iii.07"""
jhpoelen commented 4 years ago

Also note that https://www.gbif.org/occurrence/1438640852 does appear to have the correctly escaped occurrence remarks:

Labels: "granulat-/ tus, S \"; "T. Say/ Type"; "Anametis/ grisea/ H"; "MCZ SYNTYPE/ 36086/ R. L. Hawkins/ 2017.iii.07"
neilcobb commented 4 years ago

@jhpoelen @evindunn

I post all SCAN issues on https://github.com/scan-bugs-org/scan/issues

Thanks for pointing this out

neilcobb commented 4 years ago

@jhpoelen

The mcz should not have had a DwC-A on SCAN, I removed it. They are a snapshot collection and serve data via their own IPT. Ironically, I assume you would not have found the bug otherwise?

jhpoelen commented 4 years ago

Great! The MCZ collection (that should not have been there) definitely helped find the suspected Symbiota bug.

jhpoelen commented 4 years ago

@neilcobb as an administrator, you should be able to transfer this issue to https://github.com/scan-bugs-org/scan/issues .

neilcobb commented 4 years ago

@jhpoelen

As far as I know I am not an admin of this Github project. But if this is a Symbiota bug it may go beyond SCAN

jhpoelen commented 4 years ago

As far as I can tell, this is a general symbiota bug.