ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
61 stars 13 forks source link

Help needed in Arctos: Catalog record type for records that exist only as sequence data #8293

Open campmlc opened 5 days ago

campmlc commented 5 days ago

Tell us what you are trying to do

With the increasing use of whole genome sequencing, we are now able to extract the identities not only of the sequenced individual but also its parasites (or host), endosymbionts, pathogens, etc. Arctos can handle this in two ways. We can add multiple identifications to a preserved specimen record to reflect the identification of the "host" as well as all parasites, pathogens, endosymbionts etc associated with it. We can also, and preferably, create related records linked via "parasite of" or "collected with" relationships between the original specimen and other sequenced entities discovered through whole genome sequencing. If these are published through a rigorous process that excludes contamination, we can reasonably assume that this sequence is a related taxon and thereby record of some kind. My question is - if we create a new record with the related identification based on genomic evidence, what is the catalog record type in Arctos, or the "basis of record" for GBIF? PreservedSpecimen? MachineObservation? HumanObservation?

What are relevant pages in Arctos

Provide a link to or a description of the page where you need help.

mkoo commented 2 days ago

great questions and topic-- this is a great topic for a WG meeting too.

using relationships between records makes sense and keeps things discoverable. As for the record based on whole genomic sequencing, I'd be inclined to adopt what I've seen for eDNA samples -- i.e., MaterialSample (not MachineObservation to distinguish from camera traps). But maybe a new term is needed? (just first glance thoughts...)

campmlc commented 2 days ago

Happy to have this be an AWG topic.

dustymc commented 2 days ago

I don't think AWG can help - this isn't our vocabulary, and it's either clear or it's not. (It's not to me...)

https://dwc.tdwg.org/list/#dwc_MaterialSample is, I think, something else.

https://dwc.tdwg.org/terms/#materialsample / https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type#materialentity

(Did something change, or was something entered improperly? Why materialsample <--> materialentity ???)

A material entity that represents an entity of interest in whole or in part.

seems reasonable, and so does

https://dwc.tdwg.org/terms/#preservedspecimen / https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type#preservedspecimen

A specimen that has been preserved.

@tucotuco can you steer us towards an answer?

campmlc commented 2 days ago

In this case, and in others that I only have sequence data for, there is no preserved specimen. I can use Material Entity, but it does not constitute "material". It is merely an observation/deduction, based on the available matching sequence data in GenBank or ENA at this moment in time given current methods.

Jegelewicz commented 2 days ago

t is merely an observation/deduction, based on the available matching sequence data in GenBank or ENA at this moment in time given current methods.

Isn't that the answer? HumanObservation?

An output of a human observation process. Human observations are unvouchered and are expected to have NO parts.

dustymc commented 2 days ago

there is no preserved specimen

Oh - yea, I agree with @Jegelewicz and don't see any ambiguity in that situation.

campmlc commented 2 days ago

But it could be machine observation, because this ID is based on a computer algorithm to suggest the match.

dustymc commented 2 days ago

could be machine observation

An example containing all of the information would be most useful, it's very difficult to be helpful from the dark.

If there's a machine-produced indirect evidence (eg media record) of an Occurrence then machine observation would be correct.

jrpletch commented 2 days ago

I can give some context here. I've been working with WGS data from tapeworms and fleas trying to extract mitochondrial genomes for phylogenetic analysis. On a whim, I used the host (a vole) as a seed sequence in Novoplasty, which then managed to extract a whole mitochondrial genome that came back as a vole on Genbank. For my data, I am interested in pulling out any other species that may happen to be present in the sample (such as bacteria, viruses, other parasites, and host DNA). For the fleas it would be particularly interesting to see if it can give evidence of feeding on hosts other than the one it was collected from. So my question was if I were to upload host (or other non-target) sequences to Genbank, how would it be best to link those to Arctos?