Open tyden46 opened 4 years ago
Indeed, in the input for the sequence_available column is next to the source column and it seems like a bunch of contributors reported the source in the wrong column :-( I see just a few relevant sequence_available entries, for example "yes, BetaCoV/Mexico/CDMX/InDRE_01/2020"
@tbrewer-healthmap can we ask contributors to cleanup the sheets? most of it is just moving stuff from sequence_available
to source
.
It might be worth adding that given the nature of the data collection process (i.e. data input from official and unofficial epidemiological reports), the information on the sequence_available
column is limited. I wouldn't recommend using our repository as an exhaustive source for metadata on the GISAID sequences for the time being, but we should discuss this further @tbrewer-healthmap @Mougk .
Thanks for drawing our attention to this @tyden46 .
I'll see what I can do on my end.
Thanks for your help @attwad @BernardoGG and @tbrewer-healthmap
In the accompanying paper it is indicated that a column titled
sequence_available
contains the following: "If there was a genomic sequence available the accession number is inserted here."However, in the current iteration of the table available here: https://github.com/beoutbreakprepared/nCoV2019/tree/master/latest_data the "sequence_available" column contains links to various news articles, government tracking pages, tweets etc. and doesn't have any GISAID or GenBank accession IDs.