gandallab / devBrain_xQTL

7 stars 1 forks source link

gene info missing #3

Closed XiaKwan closed 3 hours ago

XiaKwan commented 3 days ago

Dear @cjops @mgandal @cyap7 @danielduyvo @leamhernandez:

Thanks for your helpful data!

It seems like the gene information of each line in the Galaxy file sQTL.mixed.40hcp.group.perm.genes.txt.gz (https://usegalaxy.org/api/datasets/f9cad7b01a472135215917481f463620/display?to_ext=tabular) is missing. How could I find that out? Or could this file be updated?

Thanks a lot!

cindywen96 commented 1 day ago

In the first column gene_id there are gene ENSG ids and gene names for each intron. For example, 1:16310:16607:clu_7638_NA:ENSG00000227232.5_3:WASH7P.

XiaKwan commented 16 hours ago

Sorry about that. My mistake. It's the file sQTL.mixed.nominal.40hcp.allpairs.txt.gz that i'm confusing. The first column is like 10:100000316:100003848:clu_20610_NA, without ENSG ids and gene names as you mentioned.

cindywen96 commented 3 hours ago

Intron annotation files are available in Synapse https://doi.org/10.7303/syn50897018.5. leafcutter_clusters_to_genes.txt maps intron cluster to genes, or all.introns.tested.tsv directly maps introns to genes. These two files should be mostly consistent in terms of intron-gene mapping.