What to set as the datasetName in Dwc-A exports

BiologicalRecordsCentre / iRecord

Repository to store and track enhancements, issues and tasks regarding the iRecord website.

http://irecord.org.uk

2 stars 1 forks source link

What to set as the datasetName in Dwc-A exports #324

Closed johnvanbreda closed 6 years ago

johnvanbreda commented 6 years ago

Part of #304.

Would this be just the website title (e.g. iRecord) or include the survey dataset (iRecord General records)?

johnvanbreda commented 6 years ago

Reply from Martin:

Firstly we need to ensure that particular datasets can be assigned to the correct Data Partner (BRC or to recording schemes appropriate). As long as that is done we can probably be more relaxed about the dataset name, which could default to “Records collated via iRecord” or “Records collated via iRecord at BRC”. Unless we feel that the full survey name (e.g. “YNU | Terrestrial”) should be retained in order to provide recognition to partners who use Indicia?

sophiathirza commented 6 years ago

I think that datasetName needs to be the name of the dataset as it is on the Atlas.

kitenetter commented 6 years ago

In the current test export format, datasetName is being populated with the survey name (e.g. “YNU | Terrestrial”). If NBN require this field to contain the dataset name as displayed on the Atlas, then this requires:

ACTION: add another field for "NBN dataset name" in the automated export content type, so that the value can be added there, and the value used as a parameter to the report to populate datasetName.

The survey name within iRecord does have value in acknowledging the source of the record entry point, which many have come from one of multiple websites for example. But I can't find anything in the DwC termlist that is a good match for this.

japonicus commented 6 years ago

The oddly named collectionCode field might be usable for the iRecord survey name unless you're already using that field for something else (may collide with NE project codes). http://rs.tdwg.org/dwc/terms/#collectionCode

the NBN's guidance notes list it as usable for:

The name, acronym or code identifying the collection or data set from which the record was derived.

kitenetter commented 6 years ago

As things stand we are proposing to use collectionCode for the NE project codes.

The difficult with iRecord's survey name is that it sometimes indicates a separate source website or project, and sometimes just a fairly trivial distinction of data structure within a single website or project. At the moment I feel it would be a nice-to-have feature rather than a critical one.

sophiathirza commented 6 years ago

The Atlas requires the datasetName to be the name of the data set on the Atlas.

We use collectionCode for the SurveyKey from Recorder6 and MarineRecorder.

johnvanbreda commented 6 years ago

Although I agree with Martin's concern about the survey name being somewhat arbitrary, wouldn't the same apply to collectionCode as extracted from Recorder 6? Survey subdivisions are very much arbitrary. Using collectionCode for the iRecord survey dataset name (that we are currently outputting in datasetName) might be more widely useful than using this field specifically for NE Project codes and would be more or less synonymous with the way that Recorder6 and Marine Recorder are doing this. Could the NE project codes be output as a dynamicProperty perhaps?

kitenetter commented 6 years ago

I'm not sure the suggested use of dynamicProperties really fits the NE codes: "A list of additional measurements, facts, characteristics, or assertions about the record. Meant to provide a mechanism for structured content." The NE codes are semi-structured, but will include an option for there being no code available, and are not really facts or assertions about the record as such.

How about datasetID - are we using that for anything else? http://rs.tdwg.org/dwc/terms/datasetID

The NE project codes seem to me to match this term's definition: "An identifier for the set of data. May be a global unique identifier or an identifier specific to a collection or institution."

sophiathirza commented 6 years ago

At the moment the datasetID is the original Gateway dataset ID, probably we don't need this anymore. It could work for the NE project codes.

johnvanbreda commented 6 years ago

Done in develop branch:

datasetID - will hold the NE project code. We will need to edit the attribute to link it to the datasetID DwC term after the code is deployed (linking attributes to DwC terms is a new feature).
collectionCode - the iRecord source field (website and survey title).
datasetName - will be populated by a field provided in the export configuration.