PATRIC3 / patric3_website

Legacy PATRIC Website (JBoss Portal Version)
MIT License
5 stars 2 forks source link

Sequences tab has repeat of accession but not contig ids. #2372

Open ARWattam opened 4 years ago

ARWattam commented 4 years ago

Please to to https://www.patricbrc.org/view/Genome/587.17#view_tab=sequences and add the Sequence ID to the table. You will then see that the Accession ID and the Sequence ID are exactly the same. One of these should be the contig ID.

Screen Shot 2020-07-17 at 1 14 10 PM
ARWattam commented 4 years ago

Moreover, when you look at the features tab, you can see the contig IDs under accession:

Screen Shot 2020-07-17 at 1 36 49 PM

But when you look at the genome browser, you only see the GenBAnk accession:

Screen Shot 2020-07-17 at 1 38 05 PM

I am not sure what is going on here.

mshukla1 commented 4 years ago

I think the problem is with the data and not UI.

When we migrated to solr in 2014-2015, we did a batch transfer of all features from relational database to solr.

I think there may be a bug in that process, which has resulted in accession being stored as sequence id in the genome sequence core, and reverse in the feature core.

The problem exist in a small fraction of genomes (<20k) present at the time.

I will run batch updates to correct the data as part of the next data release.