giannimonaco / ABIS

57 stars 16 forks source link

Incomplete mapping of transcript IDs to the gene names within ABIS-seq matrix #23

Open dumpster-fire28 opened 2 months ago

dumpster-fire28 commented 2 months ago

Hi there, thanks so much for generating and sharing this tool! I have been using ABIS on published RNAseq datasets and the transcript IDs for the TPM tables are the Ensembl Transcript stable ID version. I have been using BioMart to map thesr transcript IDs to gene names to generate the ABIS input table. Using this method there are ~195 genes from the ABIS-Seq matrix (based on SuppTable5 from your paper) that are not represented within this input table. Some such gene names are below:

AP001171.1 AP001434.2 AP003774.6 C12orf74 C9orf47 CD8BP CH17-296N19.1 CH17-373J23.1 CTA-286B10.7 CTB-113D17.1 CTB-114C7.4 CTB-50L17.14 CTB-61M7.1 CTC-205M6.1 FAM153C FAM198B FLJ27354 RP11-290F5.1 RP11-291B21.2 RP11-295P22.2 RP11-297B17.3 RP11-305L7.1

Are you also using the Ensembl Transcript stable ID version for your transcript identification? What method are you using to convert transcript IDs to gene names? If we can't represent all the genes from ABIS-Seq matrix within the gene list within the input table, will this compromise the cell deconvolution results? Thank you!

giannimonaco commented 2 months ago

Hi! Thank you for trying ABIS! For the alignment of the reads and for the conversion from transcript IDs to gene symbols, I used the files from GENCODE v26 ( https://www.gencodegenes.org/human/release_26.html). You can find the gene annotations in the GTF files. If you want to make sure that all the genes are included, you should use the same annotation file. To convert Transcripts into Genes from the kallisto alignment, I used the function summarizeToGene from the tximport package.

ABIS contains a set of genes that were poorly annotated, including long non coding RNAs, such as CH17-373J23.1 (I think this one for example does not exist anymore in the newer annotations). I think losing some of those genes should not affect the results much, especially for major cell types. Deconvolution is anyway an estimation, it is not perfect even if you have all the genes. But if you want to double check if you have differences in your results, you should try using GENCODE v26 for your data processing. At some point, I might update the tool with a newer version of the annotations, but I didn't do it yet..

On Wed, Jul 10, 2024 at 7:46 AM dumpster-fire28 @.***> wrote:

Hi there, thanks so much for generating and sharing this tool! I have been using ABIS on published RNAseq datasets and the transcript IDs for the TPM tables are the Ensembl Transcript stable ID version. I have been using BioMart to map thesr transcript IDs to gene names to generate the ABIS input table. Using this method there are ~195 genes from the ABIS-Seq matrix (based on SuppTable5 from your paper) that are not represented within this input table. Some such gene names are below:

AP001171.1 AP001434.2 AP003774.6 C12orf74 C9orf47 CD8BP CH17-296N19.1 CH17-373J23.1 CTA-286B10.7 CTB-113D17.1 CTB-114C7.4 CTB-50L17.14 CTB-61M7.1 CTC-205M6.1 FAM153C FAM198B FLJ27354 RP11-290F5.1 RP11-291B21.2 RP11-295P22.2 RP11-297B17.3 RP11-305L7.1

Are you also using the Ensembl Transcript stable ID version for your transcript identification? What method are you using to convert transcript IDs to gene names? If we can't represent all the genes from ABIS-Seq matrix within the gene list within the input table, will this compromise the cell deconvolution results? Thank you!

— Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEAH56FEORRGPY363J3ZLTDFFAVCNFSM6AAAAABKUE72Q6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TSOBQGYZTQMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dumpster-fire28 commented 2 months ago

Hi! Thanks for your reply! So true cell deconvolution is just an estimate but just wanted to ensure I have the best gene mapping possible! Following your advice, I tried the files from GENCODE but there was still ~200 genes (different than before) that were not mapped by this annotation list. However, mapping the transcript IDs via GENCODE and then Ensembl/BioMart, this reduced the previously unmappable gene names from the ABIS_seq gene list to less than 40 (not too bad!). Thank you very much for your help!!

giannimonaco commented 2 months ago

That's great! Sounds good to me!

On Mon, Jul 15, 2024 at 5:44 AM dumpster-fire28 @.***> wrote:

Hi! Thanks for your reply! So true cell deconvolution is just an estimate but just wanted to ensure I have the best gene mapping possible! Following your advice, I tried the files from GENCODE but there was still ~200 genes (different than before) that were not mapped by this annotation list. However, mapping the transcript IDs via GENCODE and then Ensembl/BioMart, this reduced the previously unmappable gene names from the ABIS_seq gene list to less than 40 (not too bad!). Thank you very much for your help!!

— Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/23#issuecomment-2227652722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEBTVFZLHNMZL7OPTWLZMNAQ7AVCNFSM6AAAAABKUE72Q6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXGY2TENZSGI . You are receiving this because you commented.Message ID: @.***>