Closed jhpoelen closed 2 years ago
Arctos captures the links to genbank in associatedSequences.
For clarity: we "capture" the link in OtherIdentifiers (same as relationships and collector numbers and such), we share via associatedSequences.
In this case, a virus (hantavirus) was extracted from the host specimen.
I think that's just a case of failing to catalog the item of scientific interest. The virus should have been cataloged and related to the mammal. That of course doesn't always happen, and my "GenBank numbers are 'self.'" statement in #2121 seems to be wrong in this case.
do you keep track of the kind of association between the host specimen and the sequence
All identifiers carry a value from https://arctos.database.museum/info/ctDocumentation.cfm?table=ctid_references; perhaps we need a way to express this situation, which probably isn't as rare as it really should be.
Yes, it is unfortunate that the virus community is not better at providing cataloged "voucher specimens" that we can link to. It has been very difficult to get most virologists to identify, designate, or archive a host voucher, or even when these exist, to link to them on GenBank. Much of the MSB's effort at tracking viruses extracted from mammal specimens have occurred over the past decade or more, prior to our having a parasite collection, so there are GenBank links for viruses as well as parasites that are attached directly to the mammal host with "self" relationships. Now that we have the capacity to catalog the parasites separately, that should be done and those GenBank sequences moved over to the parasite record, but that is a process that would consume quite a bit of staff time and resources. I'd be happy to try if we can identify those samples, but unfortunately this may require going record by record based on which mammals have virus-associated publications or citations. We can look for "symbiotype" in the citation, but that was not always available for legacy records. We also need a way to designate relationships in citations to alternate taxa, e.g." symbiotype of ... Taxon A(virus name)".
On Tue, Apr 6, 2021 at 10:59 AM dustymc @.***> wrote:
- [EXTERNAL]*
Arctos captures the links to genbank in associatedSequences.
For clarity: we "capture" the link in OtherIdentifiers (same as relationships and collector numbers and such), we share via associatedSequences.
In this case, a virus (hantavirus) was extracted from the host specimen.
I think that's just a case of failing to catalog the item of scientific interest. The virus should have been cataloged and related to the mammal. That of course doesn't always happen, and my "GenBank numbers are 'self.'" statement in #2121 https://github.com/ArctosDB/arctos/issues/2121 seems to be wrong in this case.
do you keep track of the kind of association between the host specimen and the sequence
All identifiers carry a value from https://arctos.database.museum/info/ctDocumentation.cfm?table=ctid_references; perhaps we need a way to express this situation, which probably isn't as rare as it really should be.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3550#issuecomment-814278969, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBBEX7YFMGEO7QPUE6LTHM4XTANCNFSM42PC4JWA .
@dustymc @campmlc thanks for your prompt reply and for sharing background.
Great to hear that genbank numbers can have association types just like specimens do.
I can imagine that going back and identifying the association types for existing genbank numbers with their specimen can be quite laborious. However, through GloBI, I can perhaps provide an exhaustive list of genbank numbers associated with viruses. That said, I realize that it'll take time and effort to cross reference and double check . . . so perhaps something do to when the time is right?
It might be worth mentioning that many researchers are unaware of these rich linkages that you keep. . . I am doing my best to communicate the good work on associations. . . I guess it'll take time for it to take hold.
Hey Jorrit, Vast majority of our host/virus relationships/linkages are for those which we have the symbiotype specimen here at MSB. These were done manually based on our knowledge of the relationships and an effort to get virologists doing descriptions to include host info going forward. The paper attached has the recommendations for this. Best, Jon
Jonathan L. Dunnum Ph.D. Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131
From: Jorrit Poelen @.> Sent: Tuesday, April 6, 2021 11:16 AM To: ArctosDB/arctos @.> Cc: Subscribed @.***> Subject: Re: [ArctosDB/arctos] [CONTACT] association type of the associated sequences related to host vouchers (e.g., https://arctos.database.museum/guid/MSB:Mamm:210229 https://www.ncbi.nlm.nih.gov/nuccore/EU241637) (#3550)
[EXTERNAL]
@dustymchttps://github.com/dustymc @campmlchttps://github.com/campmlc thanks for your prompt reply and for sharing background.
Great to hear that genbank numbers can have association types just like specimens do.
I can imagine that going back and identifying the association types for existing genbank numbers with their specimen can be quite laborious. However, through GloBI, I can perhaps provide an exhaustive list of genbank numbers associated with viruses. That said, I realize that it'll take time and effort to cross reference and double check . . . so perhaps something do to when the time is right?
It might be worth mentioning that many researchers are unaware of these rich linkages that you keep. . . I am doing my best to communicate the good work on associations. . . I guess it'll take time for it to take hold.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/3550#issuecomment-814290503, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PAYH6B2QIMPJKRQOBS3THM6V5ANCNFSM42PC4JWA.
@jldunnum thanks for sharing. Can you please share a citation string for the paper? Github issues does not keep the attachment alive.
Also, just in case y'all are feeling ambitious, I've attached a (partial) list of virus genbank numbers extract from indexed Grange et al. 2021 using elton interactions globalbioticinteractions/grange2021 | grep -P -o "https://[^\t]+nuccore[^\t]+" | sort | uniq > virus_genbank_numbers.txt
The first 10 are:
$ cat virus_genbank_numbers.txt | head
https://www.ncbi.nlm.nih.gov/nuccore/AB010730
https://www.ncbi.nlm.nih.gov/nuccore/AB010731
https://www.ncbi.nlm.nih.gov/nuccore/AB010732
https://www.ncbi.nlm.nih.gov/nuccore/AB010733
https://www.ncbi.nlm.nih.gov/nuccore/AB010734
https://www.ncbi.nlm.nih.gov/nuccore/AB010735
https://www.ncbi.nlm.nih.gov/nuccore/AB010736
https://www.ncbi.nlm.nih.gov/nuccore/AB010737
https://www.ncbi.nlm.nih.gov/nuccore/AB010738
https://www.ncbi.nlm.nih.gov/nuccore/AB010739
@jhpoelen this would be most helpful" an exhaustive list of genbank numbers associated with viruses"
@campmlc I shared a partial list, other GloBI indexed datasets can be used to complement this list if needed.
perhaps something do to when the time is right
Potentially a fun project for an intern/CS student/etc.
I am doing my best to communicate the good work on associations. . . I guess it'll take time for it to take hold.
It's appreciated! We obviously aren't great at communicating what we do. We've been talking to and working with GenBank since ~2000; I'm (obviously!) not sure how to do better, but I think it'll involve more than just time.
@jldunnum your attachment didn't come through.
Related:
https://github.com/ArctosDB/arctos/issues/2151 https://github.com/ArctosDB/arctos/issues/1257
Dunnum, Jonathan L., Richard Yanagihara, Karl M. Johnson, Blas Armien, Nyamsuren Batsaikhan, Laura Morgan, and Joseph A. Cook. "Biospecimen repositories and integrated databases as critical infrastructure for pathogen discovery and pathobiology research." PLoS Neglected Tropical Diseases 11, no. 1 (2017): e0005133.
Jonathan L. Dunnum Ph.D. Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131
From: Jorrit Poelen @.> Sent: Tuesday, April 6, 2021 11:26 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Mention @.> Subject: Re: [ArctosDB/arctos] [CONTACT] association type of the associated sequences related to host vouchers (e.g., https://arctos.database.museum/guid/MSB:Mamm:210229 https://www.ncbi.nlm.nih.gov/nuccore/EU241637) (#3550)
[EXTERNAL]
@jldunnumhttps://github.com/jldunnum thanks for sharing. Can you please share a citation string for the paper? Github issues does not keep the attachment alive.
Also, just in case y'all are feeling ambitious, I've attached a (partial) list of virus genbank numbers extract from indexed Grange et al. 2021 using elton interactions globalbioticinteractions/grange2021 | grep -P -o "https://[^\t]+nuccore[^\t]+" | sort | uniq > virus_genbank_numbers.txt
The first 10 are:
$ cat virus_genbank_numbers.txt | head https://www.ncbi.nlm.nih.gov/nuccore/AB010730 https://www.ncbi.nlm.nih.gov/nuccore/AB010731 https://www.ncbi.nlm.nih.gov/nuccore/AB010732 https://www.ncbi.nlm.nih.gov/nuccore/AB010733 https://www.ncbi.nlm.nih.gov/nuccore/AB010734 https://www.ncbi.nlm.nih.gov/nuccore/AB010735 https://www.ncbi.nlm.nih.gov/nuccore/AB010736 https://www.ncbi.nlm.nih.gov/nuccore/AB010737 https://www.ncbi.nlm.nih.gov/nuccore/AB010738 https://www.ncbi.nlm.nih.gov/nuccore/AB010739
virus_genbank_numbers.txthttps://github.com/ArctosDB/arctos/files/6266489/virus_genbank_numbers.txt
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/3550#issuecomment-814297079, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA27VL2GQAKUJL4DWZDTHM727ANCNFSM42PC4JWA.
Great! I don't suppose it would be possible to identify GenBank accessions that have a non-mammalian organism or taxon name but an MSB:Mamm specimen voucher or LinkOut?
Another issue is that many pathogen/parasite papers that actually did cite a host used our field/tissue number "NK" and not our MSB catalog number.
Jonathan L. Dunnum Ph.D. Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131
From: Mariel Campbell @.> Sent: Tuesday, April 6, 2021 11:30 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Mention @.> Subject: Re: [ArctosDB/arctos] [CONTACT] association type of the associated sequences related to host vouchers (e.g., https://arctos.database.museum/guid/MSB:Mamm:210229 https://www.ncbi.nlm.nih.gov/nuccore/EU241637) (#3550)
[EXTERNAL]
Great! I don't suppose it would be possible to identify GenBank accessions that have a non-mammalian organism or taxon name but an MSB:Mamm specimen voucher or LinkOut?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/3550#issuecomment-814300326, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA2LYFUV3QNMAP24CTTTHNAM5ANCNFSM42PC4JWA.
Great! I don't suppose it would be possible to identify GenBank accessions that have a non-mammalian organism or taxon name but an MSB:Mamm specimen voucher or LinkOut?
Given a little time, this can surely be done especially because of the excellent informatics resources that GenBank and Arctos provide. Also, already indexed datasets by GloBI already provide a starting point. https://github.com/globalbioticinteractions/virus-host-db comes to mind.
MSB catalog number
https://en.wikipedia.org/wiki/Money_services_business - right?!?
We've been avoiding really embracing https://handbook.arctosdb.org/how_to/cite-specimens.html forever. "MSB 210229" (and the infinite variations thereof) could mean just about anything, and digging it out of a publication is never going to be foolproof. "https://arctos.database.museum/guid/MSB:Mamm:210229" and "http://dx.doi.org/10.7299/X7ZK5H0X" are completely unambiguous. Demanding those kinds of identifiers from users would eliminate any confusion going forward, and sort of accidentally save you a whole bunch of work (which might be redirected to dealing with the legacy stuff) in the process.
Given a little time
Yep! Arctos has an API, GenBank has an API, doing more in that intersection is just a matter of time. (I'm not sure sure about "little" though...)
We have attempted "Demanding those kinds of identifiers" from GenBank as a required field/controlled vocab, most recently at the ASM meeting the summer before covid, but there still seems to be some reluctance or lack of awareness of the problem, at least from representatives designated to attend that meeting. There is also extreme reluctance to allow the collections that actually hold the specimens to make edits to fields that were incorrectly filled out by researchers submitted sequences.
On Tue, Apr 6, 2021 at 11:56 AM dustymc @.***> wrote:
- [EXTERNAL]*
MSB catalog number
https://en.wikipedia.org/wiki/Money_services_business - right?!?
We've been avoiding really embracing https://handbook.arctosdb.org/how_to/cite-specimens.html forever. "MSB 210229" (and the infinite variations thereof) could mean just about anything, and digging it out of a publication is never going to be foolproof. "https://arctos.database.museum/guid/MSB:Mamm:210229" and " http://dx.doi.org/10.7299/X7ZK5H0X" are completely unambiguous. Demanding those kinds of identifiers from users would eliminate any confusion going forward, and sort of accidentally save you a whole bunch of work (which might be redirected to dealing with the legacy stuff) in the process.
Given a little time
Yep! Arctos has an API, GenBank has an API, doing more in that intersection is just a matter of time. (I'm not sure sure about "little" though...)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3550#issuecomment-814320782, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBCX2TJCEEQ3TRC3AIDTHNDODANCNFSM42PC4JWA .
from GenBank
I can understand their reluctance to that; it's not really their job, and most of the data they see won't ever have that level of information. I obviously don't KNOW anything, but I think this would be between loan-ers and loan-ees (and/or perhaps part of your internal licensing).
Arctos contains a genbank publisher tool (IDK if it's functional, it doesn't get any use so it doesn't get any attention) which completely eliminates any ambiguity there. It even deals with barcodes, so if you have those you can tie sequences to specific parts and not just catalog records.
GenBank is special in regard to identifiers; they are one of two systems (Arctos is the other) in which "MSB:Mamm:210229" is NOT ambiguous, because we worked out the specimen_voucher field and registry with them. 65 of the current 215 collections in Arctos claim to have registered with GenBank - we as a community could certainly do better.
I believe that everything we can currently do with GenBank was worked out with Scott Federhen, and not much has changed since he died. He at least was willing to allow edits by "owning institutions" if the submitter could not be convinced to make updates, I don't know if anyone else might be inclined to allow that or even who you'd ask. (I wonder if an agreement regarding future edits to GenBank might also be part of loan agreements?) Might be worth knocking on the door if you're ever in DC - we could certainly use another interested insider.
Some GenBank fields moved from optional to highly recommended are coming to GenBank, and some new fields too, for specifying the connections between a host SEQ and the vouchered specimen and a related viral SEQ and Sample the viral SEQ came from. Stay tuned. Paper in progress. This work made possible by the #metadataregisteringpractices subgroup of the CETAF-DiSSCO Covid 19 Task Force. Pam Soltis at UF/iDigBio and Jerry Lanfear of ELIXIR can answer questions. See https://twitter.com/mcourtot/status/1376902192410603525 on Twitter and https://github.com/pha4ge/SARS-CoV-2-Contextual-Data-Specification for hints.
Thanks Deb
Jonathan L. Dunnum Ph.D. Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131
From: Debbie Paul @.> Sent: Tuesday, April 6, 2021 12:35 PM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Mention @.> Subject: Re: [ArctosDB/arctos] [CONTACT] association type of the associated sequences related to host vouchers (e.g., https://arctos.database.museum/guid/MSB:Mamm:210229 https://www.ncbi.nlm.nih.gov/nuccore/EU241637) (#3550)
[EXTERNAL]
Some GenBank fields moved from optional to highly recommended are coming to GenBank, and some new fields too, for specifying the connections between a host SEQ and the vouchered specimen and a related viral SEQ and Sample the viral SEQ came from. Stay tuned. Paper in progress. This work made possible by the #metadataregisteringpractices subgroup of the CETAF-DiSSCO Covid 19 Task Force. Pam Soltis at UF/iDigBio and Jerry Lanfear of ELIXIR can answer questions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/3550#issuecomment-814350720, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA5WPQU5CDC7NYBCW6DTHNH7ZANCNFSM42PC4JWA.
That's good news!
On Tue, Apr 6, 2021, 12:35 PM Debbie Paul @.***> wrote:
- [EXTERNAL]*
Some GenBank fields moved from optional to highly recommended are coming to GenBank, and some new fields too, for specifying the connections between a host SEQ and the vouchered specimen and a related viral SEQ and Sample the viral SEQ came from. Stay tuned. Paper in progress. This work made possible by the #metadataregisteringpractices subgroup of the CETAF-DiSSCO Covid 19 Task Force. Pam Soltis at UF/iDigBio and Jerry Lanfear of ELIXIR can answer questions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3550#issuecomment-814350720, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBFBEYOAJS2JRGQ3FA3THNH7ZANCNFSM42PC4JWA .
hey @mkoo - can you share why you are closing this issue?
We're doing a clean up of old stale issues (more than 90- 100 days old). If there are still pending problems we need a new issue. If there;s new export lists to share a new issue would make sure people see it too. But let me know if I missed a specific to-do item for the Arctos dev list as I can reopen
As far as I can tell these links from Arctos specimen to their sequences viruses are not yet indexed by GloBI. And, GenBank only seems to record the host name, no the voucher. I've created a separate issue at https://github.com/globalbioticinteractions/globalbioticinteractions/issues/755 .
@mkoo Thanks for sharing the reason for closing this issue. I understand it is nice to cleanup issues as spring is just around the corner.
I do have to say that some of these issues may be old, but that doesn't mean they are stale in my mind, just something that just hasn't been addressed yet. I think it'd be a bummer to have these valuable ideas and observations disappear in a long list of closed issue.
Perhaps you have some ideas on how to keep track of these non-trivial but potentially innovative ideas to better link our data records and infrastructures.
Yes I agree it's a danger to close -- but it's not deleted! I am trying to clean up with open issues by transferring issues to either an internal repo for more discussion and action, or reassigning issues to a different milestone, or adding labels to help us harvest open and closed issues for past discussions and ideas when searching on topics. Some get closed and I transfer their resources to working group teams. So a bunch of different tactics. Not sure if there's a universal solution but I am open to ideas!
Some actions are simply nagging a third party-- maybe I need a new label and project for that.....
@mkoo I just opened a new issue in the GloBI issue tracker to keep this thread active, and noticed that GloBI knows about the linked genbank records, but the (valuable) connection to their Arctos specimen is not yet known. https://github.com/globalbioticinteractions/globalbioticinteractions/issues/755#issuecomment-1029509362 .
I agree that the issue is not deleted, but by marking it as "closed" is does seem to appear to have been resolved.
ok I hear you, Jorrit! I'll reopen and change the issue title so it's clearer what's going on.
Reopening issue originally entitled: "[CONTACT] association type of the associated sequences related to host vouchers (e.g., https://arctos.database.museum/guid/MSB:Mamm:210229 https://www.ncbi.nlm.nih.gov/nuccore/EU241637)"
As mentioned earlier, this would be an excellent task for an intern or graduate student, or possibly a findable grant proposal?
@jhpoelen can you distinguish the direct references to specimen voucher in GenBank from the linkouts? We frequently use the latter to create relationships that the author failed to provide or provided incorrectly.
well, it's not just viruses... there are broken or absent links everywhere in GenBank.. more genbank input is needed (they could fund the intern!)
@campmlc I keep track of the source of the references, so I can distinguish them accordingly.
Question - is there any association rule I can apply to the Arctos -> GenBank relations.
E.g., all MSB specimen with genbank ids are host -> virus relations.
Or, all arctos specimen with genbank ids are host -> virus relations.
Or, all arctos specimen with genbank ids are host-virus relations only if the related genbank records notes the host name as same as arctos specimen classification.
Alternative, I can mark relations as "ecologically related to"
ps. Money is probably better spend if they fund Arctos / GloBI ; )**
**Disclaimer, I am a contributor to GloBI...
We do have some great contacts now for Elixir (Jerry Lanfear) and GenBank (via Ruth Timme) for working with them to make changes. See Thompson CW, Phelps KL, Allard MW, Cook JA, Dunnum JL, Ferguson AW, Gelang M, Khan FAA, Paul DL, Reeder DM, Simmons NB, Vanhove MPM, Webala PW, Weksler M, Kilpatrick CW. Preserve a Voucher Specimen! The Critical Need for Integrating Natural History Collections in Infectious Disease Studies. mBio. 2021 Jan 12;12(1):e02698-20. doi: 10.1128/mBio.02698-20. PMID: 33436435; PMCID: PMC7844540.
@debpaul Great! What can they contribute to solving this issue?
I also noted your earlier comment from about a year ago
Some GenBank fields moved from optional to highly recommended are coming to GenBank, and some new fields too, for specifying the connections between a host SEQ and the vouchered specimen and a related viral SEQ and Sample the viral SEQ came from. Stay tuned. Paper in progress. This work made possible by the #metadataregisteringpractices subgroup of the CETAF-DiSSCO Covid 19 Task Force. Pam Soltis at UF/iDigBio and Jerry Lanfear of ELIXIR can answer questions. See https://twitter.com/mcourtot/status/1376902192410603525 on Twitter and https://github.com/pha4ge/SARS-CoV-2-Contextual-Data-Specification for hints.
Anything change since then?
Anything change since then?
@jhpoelen what changed was two things (loosely speaking). One was more fields moved from optional to recommended AND a change in finding the connections to people who are willing to sit at the table to discuss / work on needed changes. Of course there's much work to be done at different levels. But good to keep these partners in the loop to make maximum impact and work on inclusion.
@debpaul thanks for elaborating. Sounds like the stars are aligning.
@jhpoelen @jldunnum - following up on this. Regarding below,
Question - is there any association rule I can apply to the Arctos -> GenBank relations. Possibly the following? all arctos specimen with genbank ids are host-virus relations only if the related genbank records are viral sequences and the specimen voucher or host field refers to a different taxonomic group? This is awkward - you would need to know that "MSB:Mamm" is a mammal collection, etc.
From the Arctos end, we can find a lot of these via a search on "symbiotype" - but this does not distinguish symbiotype of "what taxon". @dustymc we have previously discussed some way of allowing a taxon name to be entered into the "symbiotype of" field - right now, it just refers to the host, not the parasite or pathogen. Ideas to fix?
In the meantime, I'm going through the symbiotype records with links to GenBank and adding the "host of" references, which should give @jhpoelen something to start with. First example: https://arctos.database.museum/guid/MSB:Mamm:148558
Also @dustymc note that the reciprocal linkouts for the GenBank virus sequences are still not working in this example.
Here is another with relationships added. @jhpoelen can you use these examples to find others? https://arctos.database.museum/guid/MSB:Mamm:148794
I just created this relationship: https://arctos.database.museum/guid/MSB:Mamm:135531 with a taxon name (new species) as an OrganismID. Should actually change to a "host of" relationship to proposed new field "TaxonID" which would link to the taxonomy table. The publication of this n.sp. did not give the HWML catalog numbers - I will try to track them down. But using TaxonID would allow linkage between a catalog record and a taxonomic name, which could help solve the problem of the symbiotype relationship mentioned above. Possible?
Ideas to fix?
Catalog the stuff that seems to be important and make the correct assertions.
@campmlc you wrote:
Yes, it is unfortunate that the virus community is not better at providing cataloged "voucher specimens" that we can link to. It has been very difficult to get most virologists to identify, designate, or archive a host voucher, or even when these exist, to link to them on GenBank.
Are you intrigued by the idea of a panel discussion / webinar about the above topic with members of the virus community joining us? We could discuss changes (that occurred as a result of Covid) and changes in standard-of-practice still needed or needing to be adopted -- both by collections and virologists? Pam Soltis and I could possibly arrange such a thing.
@debpaul Yes, that would be fantastic! @jldunnum
Ideas to fix?
Catalog the stuff that seems to be important and make the correct assertions.
@dustymc this would require we create new virus collections, fungal collections, bacterial collections etc for things we do not have vouchers for, in order to say that this "host" record is related to this "pathogen" record". And which institution will manage these? Right now we can do this for our integrated host and parasite collections at the institutional level, if we add in all the taxonomy (big can of worms, there), but what about things in external repositories? At a minimum, we need to be able to say this "host" was tested for this "pathogen" by this method/citation on this date with results positive/negative and quantitative values of results. We could do this with specimen attributes, or possibly part attributes, or maybe a separate "tested for" module, but we would still need the taxonomy linked here if possible. That was my suggestion above.
If those things exist then of course they can be linked to.
If they don't but structured data are necessary, a Host collection could be used. That's of course more work for all the reasons you point out, but I don't think there's a lesser cost which leads to those kinds of data.
If structured data aren't critical (or critical enough to inspire someone to manage a Host collection, anyway!), then things like verbatim host ID provide a text-based alternative.
I don't think any amount of shoehorning will much change that, but it might break other things.
Hey y'all - coming to the conversation a bit late, but please note that GloBI is now resolving the ncbi records as reported in the arctos records. This means that GloBI also pulls in the taxonomic information (and more) from the NCBI genbank records and enables taxonomic searches for either host or hostee .
E.g.,
https://arctos.database.museum/guid/MSB:Mamm:148794
has already been indexed by GloBI (see attached screenshots).
For this specific example, you can find specimen to specimen links via "download csv sample" link or
source_taxon_name | source_taxon_path | source_taxon_path_ids | source_specimen_occurrence_id | source_specimen_institution_code | source_specimen_collection_code | source_specimen_catalog_number | source_specimen_life_stage_id | source_specimen_life_stage | source_specimen_physiological_state_id | source_specimen_physiological_state | source_specimen_body_part_id | source_specimen_body_part | source_specimen_sex_id | source_specimen_sex | source_specimen_basis_of_record | interaction_type | target_taxon_name | target_taxon_path | target_taxon_path_ids | target_specimen_occurrence_id | target_specimen_institution_code | target_specimen_collection_code | target_specimen_catalog_number | target_specimen_life_stage_id | target_specimen_life_stage | target_specimen_physiological_state_id | target_specimen_physiological_state | target_specimen_body_part_id | target_specimen_body_part | target_specimen_sex_id | target_specimen_sex | target_specimen_basis_of_record | latitude | longitude | event_date | study_citation | study_url | study_source_citation | study_source_archive_uri |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sorex roboratus | Animalia | Chordata | Mammalia | Soricomorpha | Soricidae | Sorex | Sorex roboratus | EOL:1 | EOL:694 | EOL:1642 | EOL:8711 | EOL:8714 | EOL:10807 | EOL:323674 | http://arctos.database.museum/guid/MSB:Mamm:148794?seid=40658 | MSB | Mamm | MSB:Mamm:148794 | young | lung | male | PreservedSpecimen | hostOf | Kenkeme virus | root | Viruses | Riboviria | Orthornavirae | Negarnaviricota | Polyploviricotina | Ellioviricetes | Bunyavirales | Hantaviridae | Mammantavirinae | Orthohantavirus | Kenkeme orthohantavirus | Kenkeme virus | NCBI:1 | NCBI:10239 | NCBI:2559587 | NCBI:2732396 | NCBI:2497569 | NCBI:2497571 | NCBI:2497576 | NCBI:1980410 | NCBI:1980413 | NCBI:2560074 | NCBI:1980442 | NCBI:1980474 | NCBI:765147 | http://www.ncbi.nlm.nih.gov/nuccore/GQ306150 | PreservedSpecimen | 62.07003 | 128.93831 | 2006-08-19T00:00:00Z | http://arctos.database.museum/guid/MSB:Mamm:148794 | http://arctos.database.museum/guid/MSB:Mamm:148794 | Natural History Collections managed by Arctos (https://arctosdb.org) accessed via https://vertnet.org . | https://github.com/globalbioticinteractions/vertnet/archive/411bd21192e50ddccd51381a731444f74b032ffb.zip | ||||||||||||||||
Sorex roboratus | Animalia | Chordata | Mammalia | Soricomorpha | Soricidae | Sorex | Sorex roboratus | EOL:1 | EOL:694 | EOL:1642 | EOL:8711 | EOL:8714 | EOL:10807 | EOL:323674 | http://arctos.database.museum/guid/MSB:Mamm:148794?seid=40658 | MSB | Mamm | MSB:Mamm:148794 | young | lung | male | PreservedSpecimen | hostOf | Kenkeme virus | root | Viruses | Riboviria | Orthornavirae | Negarnaviricota | Polyploviricotina | Ellioviricetes | Bunyavirales | Hantaviridae | Mammantavirinae | Orthohantavirus | Kenkeme orthohantavirus | Kenkeme virus | NCBI:1 | NCBI:10239 | NCBI:2559587 | NCBI:2732396 | NCBI:2497569 | NCBI:2497571 | NCBI:2497576 | NCBI:1980410 | NCBI:1980413 | NCBI:2560074 | NCBI:1980442 | NCBI:1980474 | NCBI:765147 | http://www.ncbi.nlm.nih.gov/nuccore/GQ306149 | PreservedSpecimen | 62.07003 | 128.93831 | 2006-08-19T00:00:00Z | http://arctos.database.museum/guid/MSB:Mamm:148794 | http://arctos.database.museum/guid/MSB:Mamm:148794 | Natural History Collections managed by Arctos (https://arctosdb.org) accessed via https://vertnet.org . | https://github.com/globalbioticinteractions/vertnet/archive/411bd21192e50ddccd51381a731444f74b032ffb.zip | ||||||||||||||||
Kenkeme virus | root | Viruses | Riboviria | Orthornavirae | Negarnaviricota | Polyploviricotina | Ellioviricetes | Bunyavirales | Hantaviridae | Mammantavirinae | Orthohantavirus | Kenkeme orthohantavirus | Kenkeme virus | NCBI:1 | NCBI:10239 | NCBI:2559587 | NCBI:2732396 | NCBI:2497569 | NCBI:2497571 | NCBI:2497576 | NCBI:1980410 | NCBI:1980413 | NCBI:2560074 | NCBI:1980442 | NCBI:1980474 | NCBI:765147 | http://www.ncbi.nlm.nih.gov/nuccore/GQ306148 | PreservedSpecimen | hasHost | Sorex roboratus | Animalia | Chordata | Mammalia | Soricomorpha | Soricidae | Sorex | Sorex roboratus | EOL:1 | EOL:694 | EOL:1642 | EOL:8711 | EOL:8714 | EOL:10807 | EOL:323674 | http://arctos.database.museum/guid/MSB:Mamm:148794?seid=40658 | MSB | Mamm | MSB:Mamm:148794 | young | lung | male | PreservedSpecimen | 62.07003 | 128.93831 | 2006-08-19T00:00:00Z | http://arctos.database.museum/guid/MSB:Mamm:148794 | http://arctos.database.museum/guid/MSB:Mamm:148794 | Natural History Collections managed by Arctos (https://arctosdb.org) accessed via https://vertnet.org . | https://github.com/globalbioticinteractions/vertnet/archive/411bd21192e50ddccd51381a731444f74b032ffb.zip | ||||||||||||||||
Kenkeme virus | root | Viruses | Riboviria | Orthornavirae | Negarnaviricota | Polyploviricotina | Ellioviricetes | Bunyavirales | Hantaviridae | Mammantavirinae | Orthohantavirus | Kenkeme orthohantavirus | Kenkeme virus | NCBI:1 | NCBI:10239 | NCBI:2559587 | NCBI:2732396 | NCBI:2497569 | NCBI:2497571 | NCBI:2497576 | NCBI:1980410 | NCBI:1980413 | NCBI:2560074 | NCBI:1980442 | NCBI:1980474 | NCBI:765147 | http://www.ncbi.nlm.nih.gov/nuccore/GQ306150 | PreservedSpecimen | hasHost | Sorex roboratus | Animalia | Chordata | Mammalia | Soricomorpha | Soricidae | Sorex | Sorex roboratus | EOL:1 | EOL:694 | EOL:1642 | EOL:8711 | EOL:8714 | EOL:10807 | EOL:323674 | http://arctos.database.museum/guid/MSB:Mamm:148794?seid=40658 | MSB | Mamm | MSB:Mamm:148794 | young | lung | male | PreservedSpecimen | 62.07003 | 128.93831 | 2006-08-19T00:00:00Z | http://arctos.database.museum/guid/MSB:Mamm:148794 | http://arctos.database.museum/guid/MSB:Mamm:148794 | Natural History Collections managed by Arctos (https://arctosdb.org) accessed via https://vertnet.org . | https://github.com/globalbioticinteractions/vertnet/archive/411bd21192e50ddccd51381a731444f74b032ffb.zip | ||||||||||||||||
Sorex roboratus | Animalia | Chordata | Mammalia | Soricomorpha | Soricidae | Sorex | Sorex roboratus | EOL:1 | EOL:694 | EOL:1642 | EOL:8711 | EOL:8714 | EOL:10807 | EOL:323674 | http://arctos.database.museum/guid/MSB:Mamm:148794?seid=40658 | MSB | Mamm | MSB:Mamm:148794 | young | lung | male | PreservedSpecimen | hostOf | Kenkeme virus | root | Viruses | Riboviria | Orthornavirae | Negarnaviricota | Polyploviricotina | Ellioviricetes | Bunyavirales | Hantaviridae | Mammantavirinae | Orthohantavirus | Kenkeme orthohantavirus | Kenkeme virus | NCBI:1 | NCBI:10239 | NCBI:2559587 | NCBI:2732396 | NCBI:2497569 | NCBI:2497571 | NCBI:2497576 | NCBI:1980410 | NCBI:1980413 | NCBI:2560074 | NCBI:1980442 | NCBI:1980474 | NCBI:765147 | http://www.ncbi.nlm.nih.gov/nuccore/GQ306148 | PreservedSpecimen | 62.07003 | 128.93831 | 2006-08-19T00:00:00Z | http://arctos.database.museum/guid/MSB:Mamm:148794 | http://arctos.database.museum/guid/MSB:Mamm:148794 | Natural History Collections managed by Arctos (https://arctosdb.org) accessed via https://vertnet.org . | https://github.com/globalbioticinteractions/vertnet/archive/411bd21192e50ddccd51381a731444f74b032ffb.zip | ||||||||||||||||
Kenkeme virus | root | Viruses | Riboviria | Orthornavirae | Negarnaviricota | Polyploviricotina | Ellioviricetes | Bunyavirales | Hantaviridae | Mammantavirinae | Orthohantavirus | Kenkeme orthohantavirus | Kenkeme virus | NCBI:1 | NCBI:10239 | NCBI:2559587 | NCBI:2732396 | NCBI:2497569 | NCBI:2497571 | NCBI:2497576 | NCBI:1980410 | NCBI:1980413 | NCBI:2560074 | NCBI:1980442 | NCBI:1980474 | NCBI:765147 | http://www.ncbi.nlm.nih.gov/nuccore/GQ306149 | PreservedSpecimen | hasHost | Sorex roboratus | Animalia | Chordata | Mammalia | Soricomorpha | Soricidae | Sorex | Sorex roboratus | EOL:1 | EOL:694 | EOL:1642 | EOL:8711 | EOL:8714 | EOL:10807 | EOL:323674 | http://arctos.database.museum/guid/MSB:Mamm:148794?seid=40658 | MSB | Mamm | MSB:Mamm:148794 | young | lung | male | PreservedSpecimen | 62.07003 | 128.93831 | 2006-08-19T00:00:00Z | http://arctos.database.museum/guid/MSB:Mamm:148794 | http://arctos.database.museum/guid/MSB:Mamm:148794 | Natural History Collections managed by Arctos (https://arctosdb.org) accessed via https://vertnet.org . | https://github.com/globalbioticinteractions/vertnet/archive/411bd21192e50ddccd51381a731444f74b032ffb.zip |
To find all arctos - genbank links known to GloBI, you could use something like;
$ curl https://depot.globalbioticinteractions.org/snapshot/target/data/tsv/interactions.tsv.gz\
| gunzip\
| grep 'arctos[.]database'\
| grep nuccore\
| tee arctos-genbank-links.tsv
which is bash linux speak for saying: get me the latest indexed interactions via GloBI's interactions.tsv. Then select only rows that contain "arctos.database" and "nuccore" terms. Finally, put the results in the file arctos-genbank-links.tsv
.
According to recent interactions.tsv, this yield 444 interaction claims. See attached zip for csv/tsv versions of these claims.
Curious to hear whether this is at all useful.
Fantastic! Thanks @jhpoelen ! I'll look over this list and see what else we can add.
To find all arctos - genbank links known to GloBI, you could use something like;
$ curl https://depot.globalbioticinteractions.org/snapshot/target/data/tsv/interactions.tsv.gz\ | gunzip\ | grep 'arctos[.]database'\ | grep nuccore\ | tee arctos-genbank-links.tsv
which is bash linux speak for saying: get me the latest indexed interactions via GloBI's interactions.tsv. Then select only rows that contain "arctos.database" and "nuccore" terms. Finally, put the results in the file
arctos-genbank-links.tsv
.
@jhpoelen may i say how much i love the above "translation" Thank you!
@debpaul you are welcome! Please do let me know if other things need translating.
Hi!
As I was looking into indexing a recently published host-virus dataset via https://www.pnas.org/content/118/15/e2002324118 and https://github.com/globalbioticinteractions/globalbioticinteractions/issues/644 , I stumbled across https://www.ncbi.nlm.nih.gov/nuccore/EU241637 and their link to https://arctos.database.museum/guid/MSB:Mamm:210229 (see attached screenshot).
Very neat to see how all the links are pointing back and forth across the various systems (e.g., genbank <-> Arctos).
Currently, Arctos captures the links to genbank in associatedSequences. However, from the data provided, it is not clear what was sequenced. In this case, a virus (hantavirus) was extracted from the host specimen.
When dealing with associated sequences, do you keep track of the kind of association between the host specimen and the sequence, like you do with the host-parasite relations?
Ideally, I'd like to extract species interactions records from the associatedSequences, but only if the sequence documents anything other than the host itself.
Thanks for all your hard work in keeping Arctos going!
related to https://github.com/ArctosDB/arctos/issues/2121 .