gigascience / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
9 stars 14 forks source link

examine and refine how we handle external link-prefixes in GigaDB #824

Open only1chunts opened 2 years ago

only1chunts commented 2 years ago

User story

As a curator I want to be able to add accession numbers of externally hosted linked data So that we can link directly to relevant data hosted in external repositories such as INSDC

As a website user I want to see in a dataset page accession number linking to the appropriate source So that I can navigate to the link destination based on my preferences

Acceptance criteria

Given I have a BioProject accession (e.g. PRJNA144099) number that I wish to add to a GigaDB dataset When I add the accession number with the prefix "BioProject:" to the dataset_link table e.g. "BioProject:PRJNA144099" Then the link to the relevant URL is included in the GigaDB dataset page depending on the logged in user preference or unlogged in default: default = https://www.ebi.ac.uk/ena/browser/view/PRJNA144099 NCBI = https://www.ncbi.nlm.nih.gov/bioproject/PRJNA144099 EBI = https://www.ebi.ac.uk/ena/browser/view/PRJNA144099 DDBJ = Do not display NCBI or EBI submitted BioProjects so this is not valid for all BioProject accessions

Additional Info

INSDC archives such as SRA, BioProject, BioSample and GenBank are mirrored in 3 different repositories around the world; NCBI in USA, EBI in Europe, DDBJ in Asia. People have their own preferences on which of these repositories they prefer to use and we currently attempt to allow the registered users to choose which they are sent to. This does cause complications in the link-prefix table! and that is why the entire method currently being used probably needs an overhaul!

NB for BioProjects there are regex to those accessions based on their origins, see list here

We will need the ability to add new link prefixes quickly and easily, hence the current admin page for link prefixes.

The current list of link prefixes needs tidying up! Frankly, it's soo bad I don't even know how it's still working! Things to correct: EBI/NCBI/DDBJ has been added to various entries that are not even mirrored in those 3 institutes! at least 2 prefixes are present for ontologies (DOID MEDDRA), no idea why or if they are used for anything, I can't see any reason why they should be included here. yahoo? they dont provide accessions?! http ? why is that there? an old entry for EGA with outdated URL remains, even though there is a new one also! PXD = ProteomeXchange ERA has changed name to ENA PROJECT should be BioProject

We should include RRIDs

There is also a ticket #279 suggesting we add a column for regular expression value of accessions, which is a good idea

In addition, it would perhaps be useful to include a short description of each row to enable help icons on website to assist users in choosing the correct prefix (in the future).

Also we need to consider the implications of any changes made to the link prefix table on the display of datasets.

We may want to add mandatory checks in the admin interface before changes are actioned, i.e. two curators sign off on changes, or URLs are tested and confirmed or something else?!

Here I list the accession number providers that we either already have or know we should be ready to accept:

More info

http://gigadb.gigasciencejournal.com:9170/adminLinkPrefix/update/id/23

no source should be legal

http://gigadb.gigasciencejournal.com:9170/adminLink/admin

at the moment, prefix and accession number are in same column, should be separated

Product Backlog Item Ready Checklist

Product Backlog Item Done Checklist

only1chunts commented 1 year ago

it might be worth checking how the curies idea in #424 might be implemented as its a synonymous system and the Names to Things application might be used here or the bioregistry might be used there

only1chunts commented 1 year ago

@cthoyt is keen to encourage us to use Bioregistries for this task, and they have various tools that may make it easier for us to implement it, worth having a discussion with them before starting work on it. Including valid regex for various things that we use.

cthoyt commented 1 year ago

Yes, I'm also happy to make any improvements to the existing software/data to support your use case. We're also thinking about reimplementations in other languages, too, if a combination of python packages and web api endpoints isn't sufficient