Closed lingyun1010 closed 1 month ago
as agreed on Slack, the accession-to-link resolution should be made independent of experiment type and rely on just the accession style itself here are the accession to resource mappings:
ArrayExpress accessions E-MTAB<> -> ArrayExpress E-ERAD<> -> ArrayExpress E-GEUV<> -> ArrayExpress
Proteome Exchange accessions - can be viewed in PRIDE (and elsewhere) PDX<> -> PRIDE
GEO accessions GSE<> -> GEO GDS<> -> GEO
INSDC consortium project accessions - can be viewed in ENA (and elsewhere) ERP<> -> ENA SRP<> -> ENA DRP<> -> ENA
BioProject NSDC consortium accessions - can be viewed in ENA (and elsewhere) PRJEB<> -> ENA PRJNA<> -> ENA PRJDB<> -> ENA
EGA accessions EGAS<> -> EGA EGAD<> -> EGA
Some E-HCAD experiments (so these would be in SCEA only, not bulk) may have a 'bundle ID' in the secondary accession field in idf but I am not sure if that could be used to search and point to a project in the HCA Data portal
I've added EGA accession mapping to the list above. Following discussions on Slack and during sprint mtg I suggest to dump the existing display hierarchy as it could accidentally remove valid multiple entries (e.g. for some CURD datasets where more than 1 experiment has been combined into one) and instead display all sources by default. The logic to check for truly synonymous entries may be quite complicated and not worth the effort right now I believe. If we discover cases where displaying all creates problems for users we can reevaluate.
Hi @sfexova, I have implemented the EGA
, ENA
and GEO
resource links, but for ArrayExpress
, it's a bit different, for example, experiment E-MTAB-1913
, in the idf
file, there is only one secondaryAccession
which is ERP003983
pointing to ENA
but there is no secondary accessions pointing to ArrayExpress
except for the experiment accession itself.
So does that mean that ArrayExpress
should look by the experiment accession or the secondary accession or both?
ah, good point!! yes, for experiments from ArrayExpress it needs to be a bit different - for experiments with the ArrayExpress accession E-MTAB-XX we should look at the experiment accession only and ignore the [secondary accession] pointing to ENA because there we know they are synonymous
ah, good point!! yes, for experiments from ArrayExpress it needs to be a bit different - for experiments with the ArrayExpress accession E-MTAB-XX we should look at the experiment accession only and ignore the [secondary accession] pointing to ENA because there we know they are synonymous
Okay, thanks for the clarification, and how about the others?
E-ERAD<> -> ArrayExpress
E-GEUV<> -> ArrayExpress
Are these the experiment accession or [secondary accession] ? Thanks.
yes, same rules for E-ERAD and E-GEUV as for the E-MTAB AE accessions > for these, ignore [secondary accession] and use experiment accession to link to ArrayExpress the mapping rules above were all meant for the [secondary accession] - for cases when these different accession codes appear in the [secondary accession] field in the idf
We have some conflicts mapping issues in bulk Supplementary Information page, regarding experiment type and ArrayExpress.
In the case of
E-PROT-39
, as its experiment type isRNASEQ_MRNA_DIFFERENTIAL
so the external resources are grouped toENA
and it also containsArrayExpress
link which is invalid either.https://www.ebi.ac.uk/gxa/experiments/E-PROT-39/Supplementary%20Information