Open only1chunts opened 2 years ago
it might be worth checking how the curies idea in #424 might be implemented as its a synonymous system and the Names to Things application might be used here or the bioregistry might be used there
@cthoyt is keen to encourage us to use Bioregistries for this task, and they have various tools that may make it easier for us to implement it, worth having a discussion with them before starting work on it. Including valid regex for various things that we use.
Yes, I'm also happy to make any improvements to the existing software/data to support your use case. We're also thinking about reimplementations in other languages, too, if a combination of python packages and web api endpoints isn't sufficient
User story
Acceptance criteria
Additional Info
INSDC archives such as SRA, BioProject, BioSample and GenBank are mirrored in 3 different repositories around the world; NCBI in USA, EBI in Europe, DDBJ in Asia. People have their own preferences on which of these repositories they prefer to use and we currently attempt to allow the registered users to choose which they are sent to. This does cause complications in the link-prefix table! and that is why the entire method currently being used probably needs an overhaul!
NB for BioProjects there are regex to those accessions based on their origins, see list here
We will need the ability to add new link prefixes quickly and easily, hence the current admin page for link prefixes.
The current list of link prefixes needs tidying up! Frankly, it's soo bad I don't even know how it's still working! Things to correct: EBI/NCBI/DDBJ has been added to various entries that are not even mirrored in those 3 institutes! at least 2 prefixes are present for ontologies (DOID MEDDRA), no idea why or if they are used for anything, I can't see any reason why they should be included here. yahoo? they dont provide accessions?! http ? why is that there? an old entry for EGA with outdated URL remains, even though there is a new one also! PXD = ProteomeXchange ERA has changed name to ENA PROJECT should be BioProject
We should include RRIDs
There is also a ticket #279 suggesting we add a column for regular expression value of accessions, which is a good idea
In addition, it would perhaps be useful to include a short description of each row to enable help icons on website to assist users in choosing the correct prefix (in the future).
Also we need to consider the implications of any changes made to the link prefix table on the display of datasets.
We may want to add mandatory checks in the admin interface before changes are actioned, i.e. two curators sign off on changes, or URLs are tested and confirmed or something else?!
Here I list the accession number providers that we either already have or know we should be ready to accept:
More info
http://gigadb.gigasciencejournal.com:9170/adminLinkPrefix/update/id/23
no source should be legal
http://gigadb.gigasciencejournal.com:9170/adminLink/admin
at the moment, prefix and accession number are in same column, should be separated
Product Backlog Item Ready Checklist
Product Backlog Item Done Checklist