Closed xiyasong closed 3 years ago
Hi, This depends on whether your RNA-seq data was generated with a stranded or unstranded protocol. The "collapsed_only" version should be used for stranded data only (see https://github.com/broadinstitute/gtex-pipeline/blob/master/gene_model/collapse_annotation.py for how this is generated). In general, I recommend using the same annotation/version for all analyses. ERCC are spike-in controls; if this annotation is used with data generated without these controls, the resulting counts should be zero and won't affect the results.
Hi! Thank you for your explanation! So if I understand correctly, I should use gencode.vXX.GRCh38.genes.gtf for unstranded RNA-seq data right ?
Yes that's correct.
Thank you very much !!
Hi ! Now I am a little bit confused with which genes_gtf files should be used in gene-level quantification by RNA-SeQC and eqtl analysis by fastqtl? Are those genes.gtf different with genes.collapsed_only.gtf?
Is that correct to use gencode.v26.GRCh38.genes.collapsed_only.gtf when running RNA-SeQC and gencode.v26.GRCh38.genes.gtf when running eqtl pipeline(I want to use the same GENCODE v26 as GTEx V8 used)? Because it seems I should use collapse_only mode when I run RNA-SeQC as described in TOPMed_RNAseq_pipeline.md, but in this md file's end, it said in Appendix: wrapper scripts from the GTEx pipeline : "genes_gtf: path to the collapsed, gene-level GTF (gencode.v30.GRCh38.ERCC.genes.gtf as described above)".
Also, does add ERCC or not affect the results? Thank you for your help!!