broadinstitute / gtex-pipeline

GTEx & TOPMed data production and analysis pipelines
BSD 3-Clause "New" or "Revised" License
342 stars 174 forks source link

RNA-SeQC v10 returning all zero counts #73

Closed raplayer closed 2 years ago

raplayer commented 2 years ago

I'm trying to run this pipeline using v10 on a quant-seq library (stranded fr), but the *.gene_reads.gct output file contains only zeros for column 3 (counts).

I'm building a STAR index using gencode v30 along with the indicated genome ref (grch38):

docker run --rm -v $path_to_references:/data -t broadinstitute/gtex_rnaseq:V10 \
    /bin/bash -c "STAR \
        --runMode genomeGenerate \
        --genomeDir /data/star_index_oh75 \
        --genomeFastaFiles /data/Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC.fasta \
        --sjdbGTFfile /data/gencode.v30.annotation.gtf \
        --sjdbOverhang 75 \
        --runThreadN 12"

Then running STAR alignment, marking dups with Picard, and then running RNA-SeQC with the gencode.v30.*.collapsed_only.gtf like so (and note that I checked the .md.bam file, and it does contain many high quality alignments):

docker run --rm -v $path_to_data:/data -t broadinstitute/gtex_rnaseq:V10 \
    /bin/bash -c "/src/run_rnaseqc.py \
    /data/gencode.v30.GRCh38.ERCC.genes.collapsed_only.gtf \
    /data/${sample_id}.Aligned.sortedByCoord.out.md.bam \
    ${sample_id} \
    --output_dir /data \
    --stranded fr"

The only error I'm getting is tracing back to gzip, so I don't think that's the issue:

subprocess.CalledProcessError: Command 'gzip APL000019405_S15_L001_R1_001.exon_reads.gct APL000019405_S15_L001_R1_001.gene_tpm.gct APL000019405_S15_L001_R1_001.gene_reads.gct' returned non-zero exit status 1.

Any suggestions, or anything obviously out of wack with the above?

Thanks!

francois-a commented 2 years ago

Please report RNA-SeQC–related issues at https://github.com/getzlab/rnaseqc. This may be related to https://github.com/getzlab/rnaseqc/issues/37.