ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Feature Request - #6464

Closed jrdemboski closed 1 year ago

jrdemboski commented 1 year ago

Is your feature request related to a problem? Please describe.

More NCBI identifiers needed.

More and more genomic data is being associated with Arctos specimen records. We currently have, as far as I can tell, only two descriptors for NCBI data/accession/numbers:

1) NCBI Biosample (SAMNXXXXXX), that links out 2) NCBI Sequence Read Archive Run ID, that links out

But we need more identifiers to cover other NCBI numbers such as:

3) NCBI BioProject (PRJNAXXXXX), that could be linked out 4) NCBI SRA, that could be linked out

Unless I've missed these identifiers located somewhere else, but I would expect them to be in the drop-down alphabetically with NCBI Biosample, etc.

Describe what you're trying to accomplish

this will allow us to link more NCBI identifiers to specimen records, the whole "extended" specimen approach

Describe the solution you'd like

add in these NCBI identifiers as options for specimen records

Additional context

https://arctos.database.museum/guid/DMNS:Mamm:18797

this one has a NCBI Biosample number assigned to it, but no BioProject identifier number (PRJNA726805) https://www.ncbi.nlm.nih.gov/bioproject/726805

Priority

seems like an easy ask, so would be great to have before I forget about specific examples of specimen records missing this information. and more and more of these come in weekly

dustymc commented 1 year ago

This is easy in the "new model" - just grab the whole identifier (eg from your browser's address bar) and use it with an issuing agent as type 'identifier,' no need to worry about correctly typing or accurately dissecting identifiers.

I added your example to your record.

This could be made slightly more specific by adding a "NCBI BioProject" agent as a division of NCBI, but I don't think that's really necessary here - the "department" is evident from the identifier.

(And the display is sort of messy because we're caught between systems, it'll get neater and less redundant as soon as I can.)

jrdemboski commented 1 year ago

Hi Dusty

Ok, I will keep that in mind. I wasn’t aware that you could do that

Thanks for adding that to the example record

John

From: dustymc @.> Date: Tuesday, June 27, 2023 at 11:01 AM To: ArctosDB/arctos @.> Cc: John Demboski @.>, Author @.> Subject: Re: [ArctosDB/arctos] Feature Request - (Issue #6464)

This is easy in the "new model" - just grab the whole identifier (eg from your browser's address bar) and use it with an issuing agent as type 'identifier,' no need to worry about correctly typing or accurately dissecting identifiers.

I added your example to your record.

This could be made slightly more specific by adding a "NCBI BioProject" agent as a division of NCBI, but I don't think that's really necessary here - the "department" is evident from the identifier.

(And the display is sort of messy because we're caught between systems, it'll get neater and less redundant as soon as I can.)

— Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/6464#issuecomment-1609906873, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLZA2WZRAZAVOFUAB47VZ3XNMGXXANCNFSM6AAAAAAZV4FEBI. You are receiving this because you authored the thread.Message ID: @.***>

mkoo commented 1 year ago

@jrdemboski Good topic to bring up! One of the benefits as we transition to the new model is that not only is is easier to add in identifiers but we can make it easier to search too by adding AKA values to their agent profiles or creating a separate profile.

It might be a good time to separate out the differnt databases for NCBI now so you can do separate searches for =SRA etc Does that sound about right?

jrdemboski commented 1 year ago

yes, I guess a next step would be someway to, for example, just search for all the "NCBI BioProjects" or "NCBI Biosamples" associated with a collection or some other NCBI database

dustymc commented 1 year ago

search for all the "NCBI BioProjects"

There are two ways:

  1. easy, doesn't require anything but the identifier, but still relies on substrings so perhaps slightly fragile in some unforeseen way, at least over time: just search for partial identifiers eg https://www.ncbi.nlm.nih.gov/bioproject eg https://arctos.database.museum/search.cfm?oidnum=https%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fbioproject
  2. More "correct," uses data objects instead of substrings BUT also takes a bit more planning (in choosing the correct agents during entry and such): create and use the aforementioned agent. That search (here for NCBI, I did not yet create any 'divisions') would look like this: https://arctos.database.museum/search.cfm?id_issuedby=%3DNCBI

Let me know if there's interest in (2) and I can create the agents and set the existing data use them.

jrdemboski commented 1 year ago

just my one opinion, but I go with #2

dustymc commented 1 year ago

just my one opinion

That's plenty for me!

https://arctos.database.museum/agent/21348953

Issued 4 Identifiers

https://arctos.database.museum/search.cfm?id_issuedby=NCBI%20BioSample

kderieg322079 commented 1 year ago

Did we decide we don't need NCBI BioProject as an agent? If that is the case, should I just select NCBI as the issuing agent? Looks like that is the case on this example: https://arctos.database.museum/guid/DMNS:Mamm:11111. I think it would make sense to have NCBI BioProject as an agent and division of the NCBI agent. Maybe that's already the plan, sorry I'm late to the party...

dustymc commented 1 year ago

decide

The data can do that - if it's one entity then the documentation needs improved, if it's two then we need a corresponding Agent.

NCBI BioProject

Same as https://arctos.database.museum/agent/21348953? If so it could probably use an alias (add it or let any of us know), if not fire up another Agent (don't forget a relationship to https://arctos.database.museum/agent/21347867) and let me know details so I can add it to the identifier helper.

kderieg322079 commented 1 year ago

Nope, not same as BioSample because BioSample ID corresponds to 1 specimen, but BioProject is a collection of sequences from multiple specimens. I created an agent for BioProject: https://arctos.database.museum/agent/21349072. I think I added all the necessary info, including definition from the NCBI website. Let me know if it needs tweaking.

dustymc commented 1 year ago

Thanks, looks good, added to the helper, I think we're done here.