Closed campmlc closed 1 year ago
You know what would be nice? To have this ID be magically populated by any other "genome" ID that gets added....OR maybe we just need a flag in the code table "this other ID is a genome" so that anyone could search across all of them?
Sorry to throw a wrench in! None of the above makes this addition a bad idea - just thinking that it could be magical....
I absolutely agree with a "genome" flag . . . possible? Or should we just move forward with this for now to make something that can work.
On Wed, Jun 9, 2021 at 3:31 PM Teresa Mayfield-Meyer < @.***> wrote:
- [EXTERNAL]*
You know what would be nice? To have this ID be magically populated by any other "genome" ID that gets added....OR maybe we just need a flag in the code table "this other ID is a genome" so that anyone could search across all of them?
Sorry to throw a wrench in! None of the above makes this addition a bad idea - just thinking that it could be magical....
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3652#issuecomment-858116644, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBBO3LA3NNOBOBAQBJLTR7MURANCNFSM46M2QEGQ .
But we still need a way in the interface to search for "genomic data"
On Wed, Jun 9, 2021 at 3:32 PM Mariel Campbell @.***> wrote:
I absolutely agree with a "genome" flag . . . possible? Or should we just move forward with this for now to make something that can work.
On Wed, Jun 9, 2021 at 3:31 PM Teresa Mayfield-Meyer < @.***> wrote:
- [EXTERNAL]*
You know what would be nice? To have this ID be magically populated by any other "genome" ID that gets added....OR maybe we just need a flag in the code table "this other ID is a genome" so that anyone could search across all of them?
Sorry to throw a wrench in! None of the above makes this addition a bad idea - just thinking that it could be magical....
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3652#issuecomment-858116644, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBBO3LA3NNOBOBAQBJLTR7MURANCNFSM46M2QEGQ .
But we still need a way in the interface to search for "genomic data"
I think the ID proposed in the issue would do that IF it is consistently applied (EVERY record with a current GenBank ID ALSO gets one of these). Which seems like duplication of effort. AND people searching KNOW to search for that particular OtherID, which is highly unlikely. If we can just flag otherIDs in the code table as "genomic", then the work is done for us and IDs only need to be recorded once. Add "only search records with genomic identifiers" (like the require tissues button) and you get what you want.
Can we put "Find all records with tissues", "Find all records with genomic data", "Find all records with sequence data" into some obvious search place, like in the Catalog Record box on search, but visible without "show more options"? Not just a tiny little check box hiding at top of page only for people who know where to look?
On Wed, Jun 9, 2021 at 3:41 PM Teresa Mayfield-Meyer < @.***> wrote:
- [EXTERNAL]*
But we still need a way in the interface to search for "genomic data"
I think the ID proposed in the issue would do that IF it is consistently applied (EVERY record with a current GenBank ID ALSO gets one of these). Which seems like duplication of effort. AND people searching KNOW to search for that particular OtherID, which is highly unlikely. If we can just flag otherIDs in the code table as "genomic", then the work is done for us and IDs only need to be recorded once. Add "only search records with genomic identifiers" (like the require tissues button) and you get what you want.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3652#issuecomment-858121137, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBDBTWOFKRG2XPXBJN3TR7NXFANCNFSM46M2QEGQ .
The objectives are not clear, or perhaps have shifted. I'm not sure if this is a UI issue or a data issue.
One of the proposed solutions is not consistent with https://github.com/ArctosDB/arctos/issues/3593, while it seems that the data are mostly identical (there's an external resource of a certain type but in no particular place or format indicating a particular type of usage).
https://www.ncbi.nlm.nih.gov/genome/ exists but I have no idea how it ties in here.
I am adamantly opposed to any denormalization. "EVERY record with a current GenBank ID ALSO gets one of these" will simply not happen, cannot be necessary, and inevitably results in users finding only partial datasets.
I agree with having a flag that tags individuals with genetic data. I do not want "genomic id" as an ID since that is so vague. I know it adds more on the id list but I want "Genbank", "NCBI BioSample", "BoLD", "Sequence Read Archive", and all the future ways they identify genetic information on outside databases.
but I want "Genbank",
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#genbank
"NCBI BioSample",
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#biosample
BoLD
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#bold_barcode_id
"Sequence Read Archive"
One possibly stupid idea: group those by adding some common prefix ("GenBank" becomes "genetic junk: GenBank"). We've done something similar with other data (eg https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#cmnh__carnegie_museum_of_natural_history), so that's not an entirely new flavor of weird. The search is (and probably will remain) a select multiple, users can just pick all options they're interested in. (They can do that now, but they're scattered out.)
Alternate maybe equally stupid idea: The code table has a sort order column, it could also group those things - but the not-so-alphabetical sort makes me twitchy.
Oh, I know they already exist! And we use them! From what I'm understanding from the discussion is that they want to get rid of those for a "Genomic ID" identifier to make searching for the data easier. I prefer the more descriptive identifiers.
get rid of those for a "Genomic ID" identifier to make searching for the data easier.
Oh - yea, that would make things like creating the reciprocals on genbank somewhere between painful and impossible, I'm not a fan.
Alternate maybe equally stupid idea: The code table has a sort order column, it could also group those things - but the not-so-alphabetical sort makes me twitchy.
Radical idea - add a column to the code table, "Other ID group". I bet that there are other things that could be grouped together for purposes like this. For instance, MSB could group NK with all of their other "MSB" type identifiers.
I agree we need some way to " require genomes" in the same way we " require tissues" or find vouchers.
On Thu, Jun 24, 2021, 3:33 PM Teresa Mayfield-Meyer < @.***> wrote:
- [EXTERNAL]*
Alternate maybe equally stupid idea: The code table has a sort order column, it could also group those things - but the not-so-alphabetical sort makes me twitchy.
Radical idea - add a column to the code table, "Other ID group". I bet that there are other things that could be grouped together for purposes like this. For instance, MSB could group NK with all of their other "MSB" type identifiers.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3652#issuecomment-867899124, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBHQ5T7DIWA7XMU4TZLTUOB73ANCNFSM46M2QEGQ .
add a column to the code table, "Other ID group".
I like this idea. That Other Identifier Type list is getting pretty unwieldly (fortunately most of mine are near the top). Group ideas: General object IDs (collector number, field number, ear tag...) Arctos Institution IDs (internal IDs used by our collections) - would this need a subgroup for each institution? Extraneous Institution IDs (IDs used by non-Arctos institutions: rehab centers, government agency IDs...) Online data repositories/aggreagtors (GBIF, Dryad, Genbank...)
Arctos Institution IDs (internal IDs used by our collections) - would this need a subgroup for each institution?
I would skip this and just set up the institutional groups.
Online data repositories/aggreagtors (GBIF, Dryad, Genbank...)
defeats the purpose of putting all of the "genome" ids together but maybe we need to be able to assign IDs to multiple groups? Are we going overboard there?
add a column
Given the uses of this, how's that functionally different than an embedded prefix?
Or are there uses beyond "pick from the list..."?
Are we saying that identifier types are somehow data objects in their own right, or is this some UI-thing, or ????
General object IDs (collector number, field number, ear tag...)
I'd not lump field number in there - it's (usually) for a different kind of thing (lot, sorta-I-think, rather than item).
(internal IDs used by our collections) - would this need a subgroup for each institution?
I'm not seeing clear categories in the data, adding arbitrary classifications seems like it would just add confusion. "This is an MSB number" and it's attached to a DMNS record and users pull their hair out and run away screaming.....
identifier types are somehow data objects in their own right
I believe so, but I could be convinced that I am wrong
I'm not seeing clear categories in the data
Here is one - these are all identifiers ISSUED by Museum of Southwestern Biology. All but one of them group together. I don't think MSB will want to change NK to MSB:NK, but I could be wrong. @campmlc
ID | Definition | url | |
---|---|---|---|
NK [ link ] | "New Mexico Karytoype Number," a frozen tissue collection number for the Museum of Southwestern Biology. | ||
MSB:Arth [ link ] | Museum of Southwestern Biology, University of New Mexico, Arthropod Collection catalog number. | ||
MSB:Bird [ link ] | Museum of Southwestern Biology, University of New Mexico, Bird Collection catalog number | http://arctos.database.museum/guid/MSB:Bird: | |
MSB:Fish [ link ] | Museum of Southwestern Biology, University of New Mexico, Fish Collection catalog number. | http://arctos.database.museum/guid/MSB:Fish: | |
MSB Fish Lot ID [ link ] | Museum of Southwestern Biology, University of New Mexico, Fish Collection lot identifier. | ||
MSB:Herp [ link ] | Museum of Southwestern Biology, University of New Mexico, Herpetology Collection catalog number | http://arctos.database.museum/guid/MSB:Herp: | |
MSB:Host [ link ] | Museum of Southwestern Biology, University of New Mexico, Host Collection catalog number | http://arctos.database.museum/guid/MSB:Host: | |
MSB:Inv [ link ] | Museum of Southwestern Biology, University of New Mexico, Invertebrate Collection catalog number | http://arctos.database.museum/guid/MSB:Inv: | |
MSB:Mamm [ link ] | Museum of Southwestern Biology, University of New Mexico, Mammal Collection catalog number | http://arctos.database.museum/guid/MSB:Mamm: | |
MSB: Museum of Southwestern Biology [ link ] | Museum of Southwestern Biology, Albuquerque, New Mexico. Arctos Agent | ||
MSBObs:Mamm [ link ] | Museum of Southwestern Biology, University of New Mexico, Mammal Observation Collection catalog number | http://arctos.database.museum/guid/MSBObs:Mamm: | |
MSB:Para [ link ] | Museum of Southwestern Biology, University of New Mexico, Parasite Collection catalog number | http://arctos.database.museum/guid/MSB:Para: |
[ Code Table Documentation is https://handbook.arctosdb.org/how_to/How-to-Use-Code-Tables.html ]
Goal Make it possible to find genomes through a search of OtherIDs = Genome ID
Context The genomics research community has no centralized repository for whole genomes, and currently genome data may be entered and accessible through a variety of differernt portals with differing levels of consistency and permanency in their urls. These include NCBI Assemblies, Biosamples, and other resources. Quotes from researchers asked about this: "NCBI is a pain, but if I were to be searching for a reference genome I would search in the assemblies database as these are unique to an individual sample and experiment. " "I'd used NCBI Assembly, NCBI BioSample, and NCBI BioProject as key terms for NCBI-associated genomic data. Honestly I archive my data with NCBI through SRA, but I use ENA to query/search for genomes and they use "Study", "Experiment", "Run", "Submission", "Accession", and "Taxon" IDs to identify genomes. You could integrate those labels as "ENA Study #", "ENA Experiment #" etc. or just link to "Genomic reads" or "Complete or partial genome assemblies". Raw reads are typically more valuable for reproducing or extending genomic research, whereas assembled genomes are used for reference-guided mapping assemblies. NCBI SRA numbers are included in ENA as "Submission" IDs. Here's an example and the reads for that example."
Given the current confusion, Arctos could provide identifiers for each of these links independently, but a researcher would have to know a priori which to search on or search for an increasingly longer list of potential urls. We should certainly add these as OtherIDs - later issue. But this request is to add an identifier = Genome ID where any possible link to genomic data could be entered, and which would allow researchers to search on a single identifier to locate any possible genomic info across a variety of platforms. This would have to be free -text, and of course prone to error, which is why adding the other identifiers with real linkable urls to the record is advisable. This ID is primarily a search tool or flag that such info exists.
Table https://arctos.database.museum/Admin/CodeTableEditor.cfm?action=editCollOIDT&tbl=ctcoll_other_id_type]
Value Genome ID
Definition An identifier, preferably a url, which references the external repository for genomic data for this record.
Collection type Mamm, Bird, Herp, Amph, Rept, Fish, Ento, Inv, Para, Env, Herb, Mala, Zoo
Attribute data type free text
Part tissue flag yes
Priority Very High