gagneurlab / drop

Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders
MIT License
134 stars 44 forks source link

DESeq dataset error thrown with aberrantExpression module #468

Closed gaynora7 closed 11 months ago

gaynora7 commented 1 year ago

Hello,

I was attempting to run snakemake aberrantSplicing --cores 7 and I received the error (full error log attached):

**_Warning messages: 1: In DESeqDataSet(se, design = ~1, ...) : all genes have equal values for all samples. will not be able to perform differential analysis 2: In OutriderDataSet(counts) : No sampleID was specified. We will generate a generic one. Error in colSums(cutoffPassedMatrix) : 'x' must be an array of at least two dimensions Calls: filterExpression ... filterExp -> computeExpressedGenes -> data.table -> colSums Execution halted [Mon May 15 17:30:52 2023] Error in rule AberrantExpression_pipeline_Counting_filterCounts_R: jobid: 3 input: /home/DROP/udx_1015/Output/processed_data/aberrant_expression/v104/outrider/outrider/total_counts.Rds, /home/DROP/udx_1015/Output/processed_data/preprocess/v104/txdb.db, Scripts/AberrantExpression/pipeline/Counting/filterCounts.R output: /home/DROP/udx_1015/Output/processed_results/aberrant_expression/v104/outrider/outrider/ods_unfitted.Rds log: /home/DROP/udx_1015/.drop/tmp/AE/v104/outrider/filter.Rds (check log file(s) for error details)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message_**

I am not quite sure what to do-- any suggestions? Also just a note- I am using drop 1.3.3 so I am unsure why it says "Update drop version for /home/DROP/udx_1015 to version 1.3.3" at the beginning of the error!

My sample annotation file and config.yaml are attached.

Thank you!

error.drop.5.15.23.txt test_sampleannotation.xlsx config.yaml.txt

vyepez88 commented 1 year ago

Hi, could be that the count matrix is empty. Can you open the file: /vcu_gpfs2/home/gaynora/DROP/udx_1015/Output/processed_data/aberrant_expression/v104/outrider/outrider/total_counts.Rds How does it look like? does it have full rows or columns with all 0s?

gaynora7 commented 1 year ago

Hi Vicente,

Thanks for your quick response. When I opened "total_counts.Rds" it appeared that it is columns with all 0s.

Also I think for some reason this issue has to do with the SRA BAM files ("SRR" files in sample annotation) I included in this run. When I ran snakemake aberrantExpression only with BAM files sequenced through our lab (so no SRA BAMs)-- the pipeline ran successfully.

Unsure why the SRA BAM files seem to be throwing an error :( Maybe some kind of incompatibility from SRA Run Selector?

The SRA BAM files seem to be fine to me-- they are sorted, indexed, and I ran them through samtools quickcheck - it all passed.

Thank you so much for your help and consideration!

vyepez88 commented 1 year ago

can you verify that the genome build, paired-end and strand configurations of the SRA samples are indeed the ones you indicated in the config and sample annotation?

gaynora7 commented 1 year ago

Yes! I just checked with salmon quant to verify everything was correct-- and paired-end and strandedness was entered accurately in the config and sample annotation files.

vyepez88 commented 1 year ago

weird.. can you try counting one of your SRR samples following the steps of this script: https://github.com/gagneurlab/drop/blob/master/drop/modules/aberrant-expression-pipeline/Counting/countReads.R? the count_ranges is located under: root/processed_data/aberrant_expression/{annotation}/count_ranges.Rds the different parameters come from either the sample annotation or the config file

gaynora7 commented 1 year ago

Hi Vicente,

Thanks again for getting back to me. Just wanted to update-- I am getting this error when I try to run ONLY the SRA files with snakemake aberrantExpression:

_FileNotFoundError in file /home/DROP/test/Snakefile, line 12: File mapping is empty. Please check that all files in your sample annotation exist. File "/home/DROP/test/Snakefile", line 12, in File "/home/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/DropConfig.py", line 50, in init File "/home/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/SampleAnnotation.py", line 29, in init File "/home/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/SampleAnnotation.py", line 108, in createSampleFileMapping_

I have checked my sample annotation file-- all paths to the SRA BAM files are correct. The are also intact-- they all passed samtools quickcheck

vyepez88 commented 1 year ago

can you execute: snakemake --cores 1 sampleAnnotation ?

so if you combine SRA with your samples, it does work?

gaynora7 commented 1 year ago

Hi Vicente,

When I ran snakemake --cores 1 sampleAnnotation I got this error: _FileNotFoundError in file /vcu_gpfs2/home/gaynora/DROP/test_NRB/Snakefile, line 12: File mapping is empty. Please check that all files in your sample annotation exist. File "/vcu_gpfs2/home/gaynora/DROP/test_NRB/Snakefile", line 12, in File "/vcu_gpfs2/home/gaynora/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/DropConfig.py", line 50, in init File "/vcu_gpfs2/home/gaynora/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/SampleAnnotation.py", line 29, in init File "/vcu_gpfs2/home/gaynora/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/SampleAnnotation.py", line 108, in createSampleFileMapping _

And this was when I was running ONLY the SRA samples. Ive attached the updated sample annotation file with only SRA samples.

It seems like theres an issue where the SRA BAMs are not making on the file_mapping.csv. The BAMs from my lab are automatically on there when they are run with snakemake, but no SRAs.

I tried to manually add the SRA BAM paths to file_mapping.csv, but when I reran snakemake --cores 1 sampleAnnotation I still got the same error.

Maybe it's an issue with the sample nomenclature that is messing up recognition of the file?

SRA_sampleannotation.txt

Thanks again so much!

vyepez88 commented 1 year ago

There seems to be a space between the values of RNA_BAM_FILE and DROP_GROUP in your sample annotation. Can you please remove it and try again?

gaynora7 commented 1 year ago

Hi Vicente,

I tried both of the above things with the SRA samples (reducing space between columns in sample annotation file, and adding sex column) and both still gave me this error when I ran snakemake --cores 1 sampleAnnotation:

_FileNotFoundError in file /vcu_gpfs2/home/gaynora/DROP/test_NRB/Snakefile, line 12: File mapping is empty. Please check that all files in your sample annotation exist. File "/vcu_gpfs2/home/gaynora/DROP/test_NRB/Snakefile", line 12, in File "/vcu_gpfs2/home/gaynora/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/DropConfig.py", line 50, in init File "/vcu_gpfs2/home/gaynora/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/SampleAnnotation.py", line 29, in init File "/vcu_gpfs2/home/gaynora/mambaforge/envs/drop_env/lib/python3.11/site-packages/drop/config/SampleAnnotation.py", line 108, in createSampleFileMapping _

Something about the SRA BAM files is precluding DROP's ability to recognize them. I am going to try to run OUTRIDER separately from the pipeline, hopefully it will work!

vyepez88 commented 1 year ago

that's weird. Where exactly did you download those BAM files from? I could try downloading them and testing DROP on my side.

gaynora7 commented 1 year ago

Sure! They are from here: https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=1&WebEnv=MCID_646f8aeb476027005f43ae6f&o=acc_s%3Aa

I used sra-tools prefetch followed by the fasterq-dump command to extract the fastqs. Then I aligned them to Hg38 via STAR 2.7.9, in the exact same way I aligned the samples sequenced from my lab (which worked successfully in DROP pipeline).

Thanks so much for your time and energy, much appreciated!

Also just wanted to note: I tried using DROP with other BAMs I got from different SRA accessions then the one I linked above, and they all failed. So it most likely is an incompatibility with SRA, and not this individual publication!

gaynora7 commented 1 year ago

UPDATE:

I was able to get snakemake aberrantExpression to work with my SRA samples. Unfortunately I misinterpreted salmon quant results from my BAM files-- and I entered the wrong STRAND notation in the sample annotation file. I then looked at more examples of salmon quant output and realized my mistake. Whoops- but glad it works now! I also updated DROP to the dev branch and that helped a ton too.

vyepez88 commented 1 year ago

great to know! can we close the issue or is there something pending?

gaynora7 commented 1 year ago

We can close it, thank you!