Closed pwwang closed 6 years ago
@pwwang this "genomic_pos_hg19" field is based on the last Ensembl release on GRCh37 (hg19). It's most likely there is a pseudogene in that release, and it's also mapped to Entrez Gene 7336. So you will see two positions for gene 7336.
Since Ensembl release was switched to GRCh38, "genomic_pos_hg19" values were kept the same. Instead, "genomic_pos" field (based on GRCh38) are always updated.
We will do a round of sanity check on "genomic_pos_hg19" field and remove incorrect positions as much as we can. E.g. if that additional Ensembl gene was not mapped to gene 7336 in the current Ensembl release, we will remove the position value from "genomic_pos_hg19" as well.
@newgene Thanks for the reply. That sounds good. Probably switching to GRCh38 would help.
1st column: gene ID 2nd column ( with "|" as separator): chromosomes found in genomic_pos_hg19 (only when it's a list) 3rd column: chromosomes found in genomic_pos (hg38) if any
After looking at the list @sirloon posted above, we can filter down to a list of 714 rows, which could be "fixed" based on their hg38 pos (when hg38 position contains position from a single "chr" value, and that chr appears only once in hg19 position):
poshg19hg38_fixable_by_hg38.txt
However, I would like to hold off this fix for now. Ideally, we should get those hg19 genomic pos data from NCBI, then this issue will be fixed from the data source.
I'm closing this issue for now, with the reference to this new issue I just created #50.
http://mygene.info/v3/gene/7336
The right coordinate should be the second one. No sure where is the first one from.