CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
493 stars 190 forks source link

geneset.gtf location #613

Closed wwinnerhoo closed 9 months ago

wwinnerhoo commented 1 year ago

Hi,

I am stuck in the "--gene-tag=XT" parameter for our single-cell data based on CEL-seq2 using "umi_tools count" tool. I tried the test dataset named "hgmm_100_fastqs.tar", however, the "geneset.gtf" file applied in featureCounts command was lost. Could you please tell me where I can download this file? Alternatively, how did you get the "hg38_noalt_junc85_99.dir" folder? I wish to reproduce your results as detailed as possible.

THanks for your help!

Best, Hoo

IanSudbery commented 1 year ago

The geneset we used in this example was based on: https://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz

except that the chromosomes have been renamed to their UCSC/Gencode equivalents.

For all intents and purposes which should be functionally equivalent to GENCODE version 24.

wwinnerhoo commented 1 year ago

Hi IanSudbery,

Thanks for your quick reply!! Could you also give me the reference genome in fasta format (or address) to build the index of the home genome?

Best, Hoo

IanSudbery commented 1 year ago

http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/analysisSet/hg38.analysisSet.fa.gz with any _alt and _hap contigs removed.

wwinnerhoo commented 1 year ago

Great! 🥳