getzlab / rnaseqc

Fast, efficient RNA-Seq metrics for quality control and process optimization
Other
146 stars 19 forks source link

RNASeqc workflows erroring out (when a sample has no reads mapped?) #83

Open matren395 opened 1 year ago

matren395 commented 1 year ago

Hello,

I am running the RNASeqc calling pipeline (workflow configuration: broadinstitute_gtex/rnaseqc2_v1-0_BETA_cfg_XRDtyM5Po5Y) using GTEx v8 tissue samples and looking for primarily gene counts in two kinds of GTF input files - one for Ensembl 89 protein-coding genes and other GTFs for intergenic sequences (non-coding ORFs as well as non-ORFs) as controls. Running the rnaseqc2_v1-0_BETA_cfg workflow on the Ensembl 89 protein-coding GTF gives me no problems, but for my intergenic GTFs there is a weird failure mode where about 3% of the GTEx Samples will fail (apparently) when computing the metrics file, throwing an error that says:

Total runtime: 299; Total CPU Time: 269
Average Reads/Sec: 302612
Estimating library complexity...
Generating report
Invalid range
Cannot compute median of an empty list

before going on to not generate the gene_reads.gct.gz file . I would think that it's a format issue, but this is only happening for some GTEx Samples - and more weirdly, it is only happening in some GTFs, and some other GTFs that I have produced the same way and formatted the same do not throw this error.

My hypothesis is that it's failing out when a sample has no reads/features mapped to it - after reaching out to Francois Aguet he had the same guess as well, and suggested that I submit an issue here.

Let me know what a fix or workaround would look like, or if you can figure out what's happening or what their could be to do! And let me know if there's anything more I could/should provide for this or any way to help!