COG-UK / dipi-group

Data integrity and pipeline integration working group
4 stars 1 forks source link

ENA has "United Kingdom:null" as Country for at least 84 seqs #208

Closed AngieHinrichs closed 2 years ago

AngieHinrichs commented 2 years ago

Some records in ENA have "United Kingdom:null" as Country, for example https://www.ebi.ac.uk/ena/browser/view/OW504710 . Many also have 2020 as the collection date even though COG-UK has much more recent (and detailed) collection dates. For example, OW504710 and its BioSample https://www.ebi.ac.uk/ena/browser/view/SAMEA13979005 have 2020 as the collection date but its COG-UK accession is NORT-YNB9B4G, and COG metadata has 2022-02-16 as the collection date.

Here's a file mapping COG-UK accession to ENA accession for 84 that are in the current UCSC/UShER tree: nullCogToEna.txt

AngieHinrichs commented 2 years ago

Updated file, 21500 of them now (of which 21470 are in today's cog_metadata.csv): nullCogToEna.2022-05-10.txt.gz

BioWilko commented 2 years ago

I have now fixed this for data going forward and have contacted ENA about dealing with the metadata in their system.

AngieHinrichs commented 2 years ago

Great, thanks @BioWilko!

AngieHinrichs commented 2 years ago

https://www.ebi.ac.uk/ena/browser/view/OW504710 still has "United Kingdom:null" - maybe ENA need another prod?

BioWilko commented 2 years ago

Hi Angie

If you look at the actual sample metadata for that record it has indeed been updated but having spoken to ENA this is a "flat file" which needs to be regenerated manually by themselves, I've sent a list of potentially affected accessions so they may do so.

Sam