SpeciesFileGroup / taxonworks_doc

TaxonWorks (https://taxonworks.org) documentation.
https://docs.taxonworks.org
13 stars 13 forks source link

DwC-A Import Manual #29

Open mjy opened 3 years ago

mjy commented 3 years ago

This template is an experiment in the works, feel free to modify/propose changes.

DwC-A Importer

This is version 0.0.1. All changes beyond grammar will result in an increment. Higher level increments reflect larger changes that may reflect new ways of doing things, or differences in user interfaces, etc.

You can ask for help and clarification live in person on Gitter!

Overview

Describe what's going on here.

Exercise target audience

Describe audience.

Exercise goals

The goals are to describe:

At the end of the exercise you should:

Assumptions

Gotchas

Tips

Related exercises

Exercise

Syntax

Section title 1

Subsection title (1a)

Section title 2

...

Section title 3

...

Section title N - OTU vs. TaxonNames

Within TaxonWorks informal names are not TaxonNames (they are not intended to be governed by a rule of nomenclature), they are OTU names. Within the import the following is how data are mapped to an Otu#name vs. a TaxonName#name:

<link to FAQ?>

Wrapping up

Reminder of what was taught/learned.

Addendum

Addendum topic 1

bpescador commented 3 years ago

For important, I suggest three options: Import new records, Update existing records, and Merge imported records. The Update option matches Cat. Number in the input file to Cat. number in the Data File, and updates any other fields specified. The Merge option should ignores duplicated Cat. Numbers, and imports only records with unique Cat. numbers.

debpaul commented 3 years ago

To all, is there a move to encourage use of identifiers that are GUIDs? at least in addition the CatNum?

tmcelrath commented 3 years ago

Clarify: eventDate - Best practices for format of ranged values (e.g. a malaise trap sample left out for two weeks has a date range of "2020/1/1 / 2020/1/20".

tmcelrath commented 3 years ago

Clarify: do "eventRemarks" DWC class map to Collecting Event Notes field in TW?

tmcelrath commented 3 years ago

Best practices for importing geographic area: 1) Use ISO Codes for country? 2) How does DWC importer map to Geographic Areas in TW gazetteer?

tmcelrath commented 3 years ago

How does MinimumElevationinMeters map? Goes to minimum field?

tmcelrath commented 3 years ago

Best practices: Taxonomic names Should we import a backbone first using TaxonNames importer? Or upload with DWC importer? How are non-DWC taxonomic ranks handled (e.g. subfamily)

redewalt commented 3 years ago

I remember that DWC had eventDateBegin and eventDateEnd.


From: Tommy McElrath notifications@github.com Sent: Wednesday, October 7, 2020 7:37 AM To: SpeciesFileGroup/taxonworks_doc taxonworks_doc@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [SpeciesFileGroup/taxonworks_doc] DwC-A Import Manual (#29)

Clarify: eventDate - Best practices for format of ranged values (e.g. a malaise trap sample left out for two weeks has a date range of "2020/1/1 / 2020/1/20".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/SpeciesFileGroup/taxonworks_doc/issues/29#issuecomment-704906236, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHLQ2GWS22GA43GHCBKPSOTSJROKPANCNFSM4R2DTXUA.

mjy commented 3 years ago

https://dwc.tdwg.org/terms/#event doesn't seem to include both options, maybe there is an extension?

redewalt commented 3 years ago

ISO country codes but which, 2 or 3 letter? And use full name for country as well.


From: Tommy McElrath notifications@github.com Sent: Wednesday, October 7, 2020 7:41 AM To: SpeciesFileGroup/taxonworks_doc taxonworks_doc@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [SpeciesFileGroup/taxonworks_doc] DwC-A Import Manual (#29)

Best practices for importing geographic area:

  1. Use ISO Codes for country?
  2. How does DWC importer map to Geographic Areas in TW gazetteer?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/SpeciesFileGroup/taxonworks_doc/issues/29#issuecomment-704907925, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHLQ2GVMZIWPIJJBFAPSBELSJROWDANCNFSM4R2DTXUA.

mjy commented 3 years ago

If countryCode (https://dwc.tdwg.org/list/#dwc_countryCode) then it should over-ride the Country name for import likely. Alternatively we could force the code to match the name we have for the country, which might break some cases.

redewalt commented 3 years ago

Sorry, I am always apprehensive about using just a code in case someone puts in the wrong code.


From: Matt notifications@github.com Sent: Thursday, October 8, 2020 9:14 AM To: SpeciesFileGroup/taxonworks_doc taxonworks_doc@noreply.github.com Cc: Dewalt, R Edward dewalt@illinois.edu; Comment comment@noreply.github.com Subject: Re: [SpeciesFileGroup/taxonworks_doc] DwC-A Import Manual (#29)

If countryCode (https://dwc.tdwg.org/list/#dwc_countryCode) then it should over-ride the Country name for import likely. Alternatively we could force the code to match the name we have for the country, which might break some cases.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/SpeciesFileGroup/taxonworks_doc/issues/29#issuecomment-705599424, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHLQ2GUH65JJJZXJRTEJWO3SJXCMLANCNFSM4R2DTXUA.

redewalt commented 3 years ago

I cannot see which issue you are talking about.


From: Matt notifications@github.com Sent: Thursday, October 8, 2020 9:07 AM To: SpeciesFileGroup/taxonworks_doc taxonworks_doc@noreply.github.com Cc: Dewalt, R Edward dewalt@illinois.edu; Comment comment@noreply.github.com Subject: Re: [SpeciesFileGroup/taxonworks_doc] DwC-A Import Manual (#29)

https://dwc.tdwg.org/terms/#event doesn't seem to include both options, maybe there is an extension?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/SpeciesFileGroup/taxonworks_doc/issues/29#issuecomment-705595216, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHLQ2GXNP535XXRWA3DQQVTSJXBTRANCNFSM4R2DTXUA.

redewalt commented 3 years ago

In my discussions with GBIF (Donald Hobern), they want as many ranks as we can give them. Minimum, if available, seem to be:

Kingdom Phylum Order Suborder (think odonates and leps) Family Subfamily Tribe Genus Subgenus Specific Epithet Subspecific Epithet


From: Tommy McElrath notifications@github.com Sent: Wednesday, October 7, 2020 7:44 AM To: SpeciesFileGroup/taxonworks_doc taxonworks_doc@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [SpeciesFileGroup/taxonworks_doc] DwC-A Import Manual (#29)

Best practices: Taxonomic names Should we import a backbone first using TaxonNames importer? Or upload with DWC importer? How are non-DWC taxonomic ranks handled (e.g. subfamily)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/SpeciesFileGroup/taxonworks_doc/issues/29#issuecomment-704909584, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHLQ2GUON3UFNOHZ2ADVLJLSJRPBFANCNFSM4R2DTXUA.

mjy commented 3 years ago

A little clarification. This is import against a particular standard, not best practice. If people give us a single field, with a Country code, that should be a legitimate way of importing data because it still follows the data-exchange standard (it will create a stub specimen, linked to that CollectingEvent with that Country as the GeographicArea), regardless of what the best practice for curators is.

For example, for countries, we are clarifying behaviour of the import code. We need to know what to tell the user if they provide one or both references to a Country (it is up to the user to decide what to pass us (either/both), as long as it's DwC, for this task) when they don't match or can't be imported for some reason.

I think we also only handle fields that are DwC official at this first of the code, otherwise we open an infinite range of edge-cases, so all fields like Subfamily have to wait until ratified, or are in draft, or come from a well documented extension, etc.

Remember that our other imports (and there are various, e.g. BibTex, basic nomenclature, etc.) don't have to reference DwC. This code matches against existing nomenclature in your project, so you don't need anything but a species name, if you already have the data in the system it will match all the way up. I.e., it ultimately helps you build efficient import practices as well based on matching capabilities. It also creates names if necessary.

redewalt commented 3 years ago

Found the link on the email to github and signed in.

Okay, I see now what you are getting at with the standards. I took Tommy's question about taxonomic ranks to be more general. The name is sufficient to link to other ranks in TW.

debpaul commented 3 years ago

@tmcelrath you asked

Best practices for importing geographic area:

1. Use ISO Codes for country?

FYI: note that GBIF uses The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded. (iDigBio IT preferred the 3-letter, long story). Perhaps 2-letter as this will be in line with what GBIF uses. They do map, if they get data with 3 instead of the 2-letter codes.

mjy commented 3 years ago

We have both 2 and 3 letter codes in our Gazetteers to match against.

debpaul commented 3 years ago

Hi @redewalt, you wrote:

I remember that DWC had eventDateBegin and eventDateEnd.

See dwc:eventDate for the current accepted formats which capture begin/end in one field, and note that the original version of Darwin Core did indeed have the begin-end options (see now deprecated terms earliestdatecollected and latestdatecollected) you are thinking of.

tmcelrath commented 3 years ago

INHS SOP: DWCA Importer: https://docs.google.com/document/d/1u9yXhThghCR6_seq-sLUHQ0-KGuCZFPwAimlceQy4ss/edit#heading=h.oa4f2cp75qq5

debpaul commented 3 years ago

Question/s about the import behavior:

  1. what happens when I step away from an upload (e.g. browser times out)?
  2. what if I click to go and work in a different tab in my browser or a different browser instance (leaving the process running).
  3. what if I close my browser before the upload is finished (i.e. will it continue)?

Answers to these questions need to be part of the Manual documentation.

mjy commented 3 years ago

Other questions to answer: