CatalogueOfLife / xcol

Working towards the extended Catalogue of Life Checklist
0 stars 0 forks source link

Remove quotation marks and similar marks that enclose authorship #72

Open DianRHR opened 1 year ago

DianRHR commented 1 year ago

A potential source for the xcol has quotation marks enclosing all authors. See https://www.dev.checklistbank.org/dataset/2048/classification?taxonKey=1415

image

When merging this source in a project, genus and below were merged properly but the quotation marks persisted. https://www.dev.checklistbank.org/dataset/266267/classification?taxonKey=6

image

It may not be the only source with this problem, but it may be an easy issue to clean before merging.

mdoering commented 1 year ago

Thats bad source data

mdoering commented 1 year ago

The meta.xml should be like this to fix this and also #76

<archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml">
  <core encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy='"' ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Taxon">
    <files>
      <location>taxon.txt</location>
    </files>
    <id index="0" />
    <field index="1" term="http://rs.tdwg.org/dwc/terms/scientificNameID"/>
    <field index="2" term="http://rs.tdwg.org/dwc/terms/scientificName"/>
    <field index="3" term="http://rs.tdwg.org/dwc/terms/acceptedNameUsageID"/>
    <field index="4" term="http://rs.tdwg.org/dwc/terms/higherClassification"/>
    <field index="5" term="http://rs.tdwg.org/dwc/terms/kingdom"/>
    <field index="6" term="http://rs.tdwg.org/dwc/terms/phylum"/>
    <field index="7" term="http://rs.tdwg.org/dwc/terms/class"/>
    <field index="8" term="http://rs.tdwg.org/dwc/terms/order"/>
    <field index="9" term="http://rs.tdwg.org/dwc/terms/family"/>
    <field index="10" term="http://rs.tdwg.org/dwc/terms/genus"/>
    <field index="11" term="http://rs.tdwg.org/dwc/terms/subgenus"/>
    <field index="12" term="http://rs.tdwg.org/dwc/terms/specificEpithet"/>
    <field index="13" term="http://rs.tdwg.org/dwc/terms/infraspecificEpithet"/>
    <field index="14" term="http://rs.tdwg.org/dwc/terms/taxonRank"/>
    <field index="15" term="http://rs.tdwg.org/dwc/terms/scientificNameAuthorship"/>
    <field index="16" term="http://rs.tdwg.org/dwc/terms/nomenclaturalCode"/>
    <field index="17" term="http://rs.tdwg.org/dwc/terms/taxonomicStatus"/>
  </core>
</archive>
mdoering commented 1 year ago

entire archive with the fix: dwca-tenebrionidae_north_america-v1.3.1.zip

camiplata commented 8 months ago

@DianRHR I'm reopening this issue as the quotation marks reapered,. @mdoering to whom in zookeys would you recommend us to contact to ask the to update the dataset with the fixed DWC-A? So we can have a long term solution

Captura de pantalla 2024-02-19 a la(s) 9 31 33 a m
mdoering commented 8 months ago

I would just write to the contact given by the dataset