Open DianRHR opened 1 year ago
Thats bad source data
The meta.xml should be like this to fix this and also #76
<archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml">
<core encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy='"' ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Taxon">
<files>
<location>taxon.txt</location>
</files>
<id index="0" />
<field index="1" term="http://rs.tdwg.org/dwc/terms/scientificNameID"/>
<field index="2" term="http://rs.tdwg.org/dwc/terms/scientificName"/>
<field index="3" term="http://rs.tdwg.org/dwc/terms/acceptedNameUsageID"/>
<field index="4" term="http://rs.tdwg.org/dwc/terms/higherClassification"/>
<field index="5" term="http://rs.tdwg.org/dwc/terms/kingdom"/>
<field index="6" term="http://rs.tdwg.org/dwc/terms/phylum"/>
<field index="7" term="http://rs.tdwg.org/dwc/terms/class"/>
<field index="8" term="http://rs.tdwg.org/dwc/terms/order"/>
<field index="9" term="http://rs.tdwg.org/dwc/terms/family"/>
<field index="10" term="http://rs.tdwg.org/dwc/terms/genus"/>
<field index="11" term="http://rs.tdwg.org/dwc/terms/subgenus"/>
<field index="12" term="http://rs.tdwg.org/dwc/terms/specificEpithet"/>
<field index="13" term="http://rs.tdwg.org/dwc/terms/infraspecificEpithet"/>
<field index="14" term="http://rs.tdwg.org/dwc/terms/taxonRank"/>
<field index="15" term="http://rs.tdwg.org/dwc/terms/scientificNameAuthorship"/>
<field index="16" term="http://rs.tdwg.org/dwc/terms/nomenclaturalCode"/>
<field index="17" term="http://rs.tdwg.org/dwc/terms/taxonomicStatus"/>
</core>
</archive>
entire archive with the fix: dwca-tenebrionidae_north_america-v1.3.1.zip
@DianRHR I'm reopening this issue as the quotation marks reapered,. @mdoering to whom in zookeys would you recommend us to contact to ask the to update the dataset with the fixed DWC-A? So we can have a long term solution
I would just write to the contact given by the dataset
A potential source for the xcol has quotation marks enclosing all authors. See https://www.dev.checklistbank.org/dataset/2048/classification?taxonKey=1415
When merging this source in a project, genus and below were merged properly but the quotation marks persisted. https://www.dev.checklistbank.org/dataset/266267/classification?taxonKey=6
It may not be the only source with this problem, but it may be an easy issue to clean before merging.