beoutbreakprepared / nCoV2019

Location for summaries and analysis of data related to n-CoV 2019, first reported in Wuhan, China
MIT License
657 stars 257 forks source link

"sequence_available" Column in Latest Data Doesn't Have Accession Numbers #65

Open tyden46 opened 4 years ago

tyden46 commented 4 years ago

In the accompanying paper it is indicated that a column titled sequence_available contains the following: "If there was a genomic sequence available the accession number is inserted here."

However, in the current iteration of the table available here: https://github.com/beoutbreakprepared/nCoV2019/tree/master/latest_data the "sequence_available" column contains links to various news articles, government tracking pages, tweets etc. and doesn't have any GISAID or GenBank accession IDs.

attwad commented 4 years ago

Indeed, in the input for the sequence_available column is next to the source column and it seems like a bunch of contributors reported the source in the wrong column :-( I see just a few relevant sequence_available entries, for example "yes, BetaCoV/Mexico/CDMX/InDRE_01/2020"

@tbrewer-healthmap can we ask contributors to cleanup the sheets? most of it is just moving stuff from sequence_available to source.

BernardoGG commented 4 years ago

It might be worth adding that given the nature of the data collection process (i.e. data input from official and unofficial epidemiological reports), the information on the sequence_available column is limited. I wouldn't recommend using our repository as an exhaustive source for metadata on the GISAID sequences for the time being, but we should discuss this further @tbrewer-healthmap @Mougk .

Thanks for drawing our attention to this @tyden46 .

tbrewer-healthmap commented 4 years ago

I'll see what I can do on my end.

tyden46 commented 4 years ago

Thanks for your help @attwad @BernardoGG and @tbrewer-healthmap