Closed MEladawi closed 1 year ago
Also, the 7666 genes are retuned as 6100 with unique = T and keepNA = T. Why is that?
Hi, thanks for your feedback. This issue is because we imported protein ID data from Uniprot, but the gene symbols in Uniprot are mixed. For example, the name BRE alone contains two Uniprot IDs: L8E9D4 and Q96P08. However, there is indeed a bug when dealing with one-to-many mapping. The previous logic for determining this was: if the same symbol appears, then keep the record with the same symbol.
The bug is fixed in version 1.2.4, please try again.
transId(c('BEX1','BRE','BTG2','C14orf169'),'sym',unique = T,keepNA = T)
For your second question, could you please provide example data (7666 genes you mentioned) for testing? Because if we use all human symbols in HGNC website, the results are same:
all = vroom::vroom('https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/hgnc_complete_set.txt')
all_sym = unique(all$symbol)
length(all_sym)
x = genInfo(all_sym,unique = T,keepNA = T)
table(x$input_id %in% all_sym)
table(all_sym %in% x$input_id)
Thanks, all fixed!
For #2 that was duplications in my list.
Hello,
some genes are not changed to the new symbols (the new symbol is BABAM2):
Also, the information of the genee is not complete: