futres / template

user template for FuTRES data import
0 stars 0 forks source link

Where to put stuff #22

Closed meghalithic closed 4 years ago

meghalithic commented 4 years ago

skeletal element name (eg femur) skeletal element portion or alterations or other notes pertinent to the measurement or its absence (eg "proximal half only, greatest length not possible, eroded epiphysis edge, measurement might be inaccurate" - sometimes captured under dwc:measurementRemarks and including things like "antlers removed prior to carcass weight capture" which impacts weight) occurrenceID (ie the GBIF or iDigBio provided id - does this go as organismID or individualID or something else?) identificationQualifier (eg "cf" or "sp") other catalog numbers such as field number(=dwc:recordNumber), or other catalog number (in our case we have some specimens with both EAP# and UF# because portions of the specimen are curated in different areas, or specimens from the same lot are curated in different locations, or etc.) dwc:preparations (ie info on the completeness of the skeleton for the individual or whether it is a skeleton or a partial skeleton or a formalin preserved carcass - also, this is where skeletal element name typically goes in a dwcA record) country (in cases where our data record does not contain "country code" - ie we haven't standardized to ISO - will FuTRES apply the ISO standard and so we should just put in dwc:country information into the country code category?)

locality (in cases where we have our locality information under dwc:locality and nothing under dwc:verbatimLocality?) verbatimCoordinates (in cases where we have our lat/long in non-standard form and in the dwc:verbatimCoordinates and nothing under dwc:decimalLatitude etc.) elevation (in cases where we do not have max and min elevation and have our information in the dwc:elevation term) collector name (captured under dwc:recordedBy and meaning the person who collected the specimen rather than the person who measured it) disposition (ie specimen location - where it is currently curated if not in the original host institution, or if it has already been destroyed through analysis, etc. - captured under dwc:disposition) measurement date (ie the date on which the measurement was taken - different than the event date for most metrics since these occur during cataloging or post preparation) reproductive condition (captured under dwc:reproductiveCondition and including "pregnant" which impacts weight) samplingProtocol (ie where we list things like "dataset does not include juveniles under 1 year of age") or etc. - captured under samplingProtocol)

And as far as broader metadata - where in the template should I put: Data Host/Data Source URL (e.g. VertNet if the data is coming from VertNet rather than from an individual or curational facility) Data Publisher (esp. when this is not the same as either host or institution) InstitutionID (ie the biocol or other URI as opposed to the code which is non-standardized) CollectionID (ie the URI vs the code) Institution/Collection/Dataset Name (or will FuTRES resolve those from the URIs and codes given?) Dataset Contacts (and various other roles such as collection manager, curator, data author) License/Rights (e.g. CC0) dwc:Types (ie PhysicalObject, StillImage, Text) dataGeneralizations/informationWithheld (e.g. when locality has been buffered for protection of endangered species or cultural heritage) dynamicProperties (typically where additional methods and data-set wide remarks are put)

meghalithic commented 4 years ago

skeletal element name: keep as it is, Neeka and I will combine that column with the measurement type skeletal element portion or alterations: great question....perhaps comments? occurrenceID: keep it - we'll reuse it in the pipeline identificationQualifier: we have been putting this in with the species name (e.g., "c.f. Binomial") leave as is and we'll combine them. Is there a reason they are separated? other catalog numbers: I need to add field number to the template - thank you for the reminder! Keep all the catalog numbers - it is no problem to have multiple.

dwc:preparations: This is something we should discuss at our weekly meeting.

country: If you give me a list of country codes, it's easy enough for me to match and swap them out for the country names.

locality: That's fine. It's better to put locality info in verbatim locality than the other way around. Just keep as locality for now.

verbatimCoordinates: can your coordinates not be converted?

elevation: we need to add this

collector name: this is also something we need to discuss. We could have this in the metadata? Or what bias do you envision by a collector? Or is it for credit to the collector?

disposition: does FuTRES need to have this info?

measurement date: this is also something we need to discuss (similar to collector name).

reproductive condition: we need to add this.

samplingProtocol: This is something we need to still figure out. For now, let's keep it as free text. I'll add a column for it though.

Broader metadata: these are things to discuss with John Deck. Ramona seems to recall a way this was done for the plant trait ontology.

meghalithic commented 4 years ago

skeletal element name: keep as it is, Neeka and I will combine that column with the measurement type Super. For our other data providers, is this something we should have available as a term in the template? skeletal element portion or alterations: great question....perhaps comments? Great, is there a particular comments term I should use to label that info? This is something that we put in "preparations", see below. occurrenceID: keep it - we'll reuse it in the pipeline Great, should we include it in the template for other data providers too? identificationQualifier: we have been putting this in with the species name (e.g., "c.f. Binomial") leave as is and we'll combine them. Is there a reason they are separated? Yes, in DwC they are listed separately so that the species name (scientificName) is only supposed to include the formal name and the qualifiers are in a different category. So if someone provides their file to you in DwC format from a GBIF file, this will be in a separate field and rather than ask folks to combine it themselves, might be useful to have available in the FuTRES template? other catalog numbers: I need to add field number to the template - thank you for the reminder! Keep all the catalog numbers - it is no problem to have multiple. Perfect! Would a data provider list them separately or concatenated into a single field? Or does it not matter and you can deal with them either way? dwc:preparations: This is something we should discuss at our weekly meeting. Okie dokies - as background, preparations is where skeletal element names and information about them is put currently by many folks whether paleo (specific elements) or neo ("skull" vs "partial skeleton"). This is where we've elected to put information about the portion of the element that is available (which for FuTRES purposes will inform the later user why there is only one measurement and not the expected matching one) and modifications (which for FuTRES purposes informs the later user whether they want to exercise some judgement about the accuracy of associated measurements). country: If you give me a list of country codes, it's easy enough for me to match and swap them out for the country names. Country codes are defined by ISO so I actually just have the country name BUT most folks will have a standardized country code in their data rather than or as well as the country name. locality: That's fine. It's better to put locality info in verbatim locality than the other way around. Just keep as locality for now.

As background again to my question, locality is where the paleo folks have been putting their Site Name, and where John W suggests we put ours as well. So if we include locality, we will capture site names from paleo folks and can avoid putting in a separate term called "site name" into our template. We would need to clarify for the FuTRES users that locality is where site name should be put. Also, localityID is where site codes go for paleo folks and would be logical for us as well (again avoiding a separate term). verbatimCoordinates: can your coordinates not be converted? Yes they can, but this is pertinent to the question of how much work we request the data provider to do - if someone is told that FuTRES ONLY accepts standardized coordinates and that they need to convert their verbatimCoordinates into a standard format (typically something that has gone through the georeferencing system though not always), then some folks will either not include the info or not include their data in the database. elevation: we need to add this Great, I'll keep it in my list collector name: this is also something we need to discuss. We could have this in the metadata? Or what bias do you envision by a collector? Or is it for credit to the collector? Both things are possibly important ("I won't accept any taxonomic designation by collector X" "hey I have two identical looking specimens from the same collector name but with different IDs, maybe it's a duplicate" - and for credit). It is a field that is traditional in all biological databases so it will be available and it is also something that the workshop participants said they wanted. It can't go into "metadata" (if I understand that as information that pertains to the dataset as a whole rather than each specimen) because each specimen is likely collected by a different collector. It's not essential and is available in the original data source EXCEPT for collections that are not also presented elsewhere. So a judgement call. And I think we need to define "metadata" too ... see below ...

disposition: does FuTRES need to have this info? It depends on whether the user is likely to want to find the specimen for which data is provided and would like to know if it still exists (has not been destroyed for analysis) or is no longer in the institution that published the data (on permanent loan, on exhibit, lost). They could though contact the data provider and ask - so it's a judgement call whether this is a useful thing for folks to have at their fingertips or if they will want to make the call themselves to the various collections. measurement date: this is also something we need to discuss (similar to collector name). yup, though since credit isn't part of the equation, possibly not as important as collector name reproductive condition: we need to add this. Cool samplingProtocol: This is something we need to still figure out. For now, let's keep it as free text. I'll add a column for it though. Cool (and by the way, any terms I sent in camelType are DwC standard terms).

Broader metadata: these are things to discuss with John Deck. Ramona seems to recall a way this was done for the plant trait ontology. Excellent! So here is where having a better idea of what you mean by "metadata" would be helpful. If someone is providing a single dataset from a single facility with all identical licenses and etc., these terms are metadata (in the sense that they apply to the entire dataset), but I envision that you might get datasets (e.g. horses) where each specimen needs to have this defined since they come from different collections and have different generalizations and etc. BUT I also suspect that you folks have in mind a set of information that is not essential to the way the database functions, but are "hangers on" and so are considered metadata even if they are different for each specimen.

Data Host/Data Source URL (e.g. VertNet if the data is coming from VertNet rather than from an individual or curational facility) Data Publisher (esp. when this is not the same as either host or institution) InstitutionID (ie the biocol or other URI as opposed to the code which is non-standardized) CollectionID (ie the URI vs the code) Institution/Collection/Dataset Name (or will FuTRES resolve those from the URIs and codes given?) Dataset Contacts (and various other roles such as collection manager, curator, data author) License/Rights (e.g. CC0) dwc:Types (ie PhysicalObject, StillImage, Text) dataGeneralizations/informationWithheld (e.g. when locality has been buffered for protection of endangered species or cultural heritage) dynamicProperties (typically where additional methods and data-set wide remarks are put)

emerykf commented 4 years ago

The proposed FuTRES term "GeologicalContext" is a class level term so it can't contain data (verified with John W). You'll need to put whatever was supposed to go into that term into one of the property terms.

jdeck88 commented 4 years ago

I'm confused by this issue. Probably too much going on to be a single issue. I like having a separate issue for each term addition. @megbalk can you break this down into relevant separate issues?