BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

diffSplice 'returned non-zero exit status' #218

Closed MustafaElshani closed 1 year ago

MustafaElshani commented 2 years ago

This was run using the latest docker run, the error appeared when I used the --test

DRIMSeq testing for each AS event type
Traceback (most recent call last):
  File "/usr/local/bin/flair", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/flair/flair.py", line 1180, in main
    status = diffSplice()
  File "/usr/local/lib/python3.10/dist-packages/flair/flair.py", line 1086, in diffSplice
    subprocess.check_call(ds_command + ['--matrix', args.o+'.es.events.quant.tsv', '--prefix', args.o+'.es'], stderr=ds_stderr)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/usr/local/lib/python3.10/dist-packages/flair/runDS.py', '--threads', '1', '--drim1', '6', '--drim2', '3', '--drim3', '15', '--drim4', '5', '--matrix', 'analysis/flair/flair_diffSplice.es.events.quant.tsv', '--prefix', 'analysis/flair/flair_diffSplice.es']' returned non-zero exit status 1

Probably an R package issue, any help appreciated

Mustafa

Jeltje commented 2 years ago

Please have a look at the last line of test.diffsplice.stderr.txt in your output directory. It may be that there are not enough genes left after filtering, in which case DRIMSeq cannot run.

We're working on improving the error messaging.

If this answers your question, please close this ticket. If not, please paste the output of test.diffsplice.stderr.txt in a comment. Thanks!

MustafaElshani commented 2 years ago

Hi Jeltje

Indeed that seems to be the the message !No genes left after filtering!

What does this mean?

R[write to console]: Error in dmDS_filter(counts = x@counts, min_samps_gene_expr = min_samps_gene_expr,  :
  !No genes left after filtering!

R[write to console]: In addition:
R[write to console]: There were 34 warnings (use warnings() to see them)
R[write to console]:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/flair/runDS.py", line 216, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/flair/runDS.py", line 185, in main
    R('filtered <- dmFilter(data, min_samps_gene_expr = drim1, min_samps_feature_expr = drim2, min_gene_expr = drim3, min_feature_expr = drim4)')
  File "/usr/local/lib/python3.10/dist-packages/rpy2/robjects/__init__.py", line 459, in __call__
    res = self.eval(p)
  File "/usr/local/lib/python3.10/dist-packages/rpy2/robjects/functions.py", line 203, in __call__
    return (super(SignatureTranslatedFunction, self)
  File "/usr/local/lib/python3.10/dist-packages/rpy2/robjects/functions.py", line 126, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
    cdata = function(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/rpy2/rinterface.py", line 815, in __call__
    raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in dmDS_filter(counts = x@counts, min_samps_gene_expr = min_samps_gene_expr,  :
  !No genes left after filtering!
Jeltje commented 2 years ago

Flair divides your input counts file into different kinds of alternative splicing events, abbreviated alt3, alt5, es (=exon skipping) and as (=other alternative splice). These files are labeled <something>.events.quant.tsv.

DRIMSeq then filters these to make sure the genes are expressed in at least 6 samples (this is the --drim1 parameter in flair diffSplice and min_samps_gene_expr in your error message Error in dmDS_filter You can change --drim1, but it is already as low as it can reasonably be: In any differential expression analysis you should have at least two groups that have at least three samples each. So that's six samples. It is also possible that one of your quant files is empty (you can check if they have just a header line)

I am working on the error messaging for flair diffExp. If you want, you can drop in my latest changes to your install:

wget https://raw.githubusercontent.com/BrooksLabUCSC/flair/master/src/flair/runDS.py
wget https://raw.githubusercontent.com/BrooksLabUCSC/flair/master/src/flair/flair.py
mv runDS.py flair.py /usr/local/lib/python3.10/dist-packages/flair/

and run again to get more detailed info.

MustafaElshani commented 2 years ago

diffSplice has now run fully without any errors the following output generated

-rw-r--r-- 1 root root 1.9M Sep 17 04:39 .alt3.Control_v_Treated_drimseq_results.tsv
-rw-r--r-- 1 root root 5.4M Sep 16 19:13 .alt3.events.quant.tsv
-rw-r--r-- 1 root root 2.1M Sep 17 04:22 .alt5.Control_v_Treated_drimseq_results.tsv
-rw-r--r-- 1 root root 5.2M Sep 16 19:13 .alt5.events.quant.tsv
-rw-r--r-- 1 root root  14M Sep 17 04:02 .es.Control_v_Treated_drimseq_results.tsv
-rw-r--r-- 1 root root  68M Sep 16 19:40 .es.events.quant.tsv
-rw-r--r-- 1 root root 7.5M Sep 17 07:16 .ir.Control_v_Treated_drimseq_results.tsv
-rw-r--r-- 1 root root  28M Sep 16 19:39 .ir.events.quant.tsv
-rw-r--r-- 1 root root 4.1K Sep 17 07:16 .stderr.txt

However all files remain hidden

In addition, *drimseq_results.tsv files do not seem to contain gene_id/gene_names instead they have coordinates in column 2 where it's labelled gene_id. Are there meant to be gene_id in there?

Jeltje commented 1 year ago

Can you please give us the exact command you are running? I can't reproduce your problem.

And yes, you are testing specific splices instead of transcripts; the coordinates tell you which. The *quant.tsv files give information on wich transcripts contain the splice.

MustafaElshani commented 1 year ago

Hi Jeltje

I have run the following command docker run -vpwd:/mydata -w /mydata brookslab/flair:latest flair diffSplice --test -i analysis/flair/flair_collapse/OCIAML3_collapsed.isoforms.bed -q analysis/flair/flair_quantify/counts_matrix.tsv -o analysis/flair/flair_diffSplice/

so the file remain hidden .*

head .alt3.Control_v_Treated_drimseq_results.tsv

    feature_id  gene_id sampleR1_Control_batch1 sampleR2_Control_batch1 sampleR3_Control_batch1 sampleR4_Treated_batch1 sampleR5_Treated_batch1 sampleR6_Treated_batch1 lr  df  pvalue  adj_pvalue
1   exclusion_chr10:100261978   chr10:100261978-100261044_chr10:100261978-100261028 0.09138836636671532 0.09138836636671532 0.09138836636671532 0.13740030161765793 0.13740030161765793 0.13740030161765793 1.5286755647020982  1.0 0.21631132773601214 0.8631461249329633
2   exclusion_chr10:100989453   chr10:100989453-100989643_chr10:100989453-100989705 0.152504350974627   0.152504350974627   0.152504350974627   0.18984802049987926 0.18984802049987926 0.18984802049987926 0.8459713032202671  1.0 0.3576945405229909  0.9352895759446236
3   exclusion_chr10:101035667   chr10:101035667-101035943_chr10:101035667-101036002 0.9243412590019962  0.9243412590019962  0.9243412590019962  0.905841318091491   0.905841318091491   0.905841318091491   0.35061746540168315 1.0 0.5537638058552774  0.9761679361036135
4   exclusion_chr10:101036561   chr10:101036561-101036666_chr10:101036561-101036722 0.9391810050648027  0.9391810050648027  0.9391810050648027  0.9398552913943664  0.9398552913943664  0.9398552913943664  0.0012280749679121072   1.0 0.9720447347536467  0.9981867626785553
Jeltje commented 1 year ago

We just did a new release. Can you do a docker pull brookslab/flair:1.7.0 and try again?

MustafaElshani commented 1 year ago

Dear Jeltje

The command run successfully with all files as expected

-rw-r--r-- 1 root root 7.5M Oct 29 13:48 diffsplice.alt3.events.quant.tsv
-rw-r--r-- 1 root root 7.1M Oct 29 13:48 diffsplice.alt5.events.quant.tsv
-rw-r--r-- 1 root root  68M Oct 29 14:13 diffsplice.es.events.quant.tsv
-rw-r--r-- 1 root root  40M Oct 29 14:13 diffsplice.ir.events.quant.tsv
-rw-r--r-- 1 root root  15K Oct 29 16:54 drimseq_alt3_Control_v_Treated.tsv
-rw-r--r-- 1 root root  21K Oct 29 16:48 drimseq_alt5_Control_v_Treated.tsv
-rw-r--r-- 1 root root 118K Oct 29 16:41 drimseq_es_Control_v_Treated.tsv
-rw-r--r-- 1 root root  65K Oct 29 17:47 drimseq_ir_Control_v_Treated.tsv
drwx------ 2 root root 4.0K Oct 29 17:47 workdir

However I'm still confused on the results as the second column suggest there should be gene_idhowever I think its the coordinates head drimseq_alt3_Control_v_Treated.tsv

feature_id  gene_id sampleR1_Control_batch1 sampleR2_Control_batch1 sampleR3_Control_batch1 sampleR4_Treated_batch1 sampleR5_Treated_batch1 sampleR6_Treated_batch1 lr  adj_pvalue
exclusion_chr3:170868297    chr3:170868297-170868201_chr3:170868297-170868134   0.817   0.817   0.817   0.605   0.605   0.605   239.3   1.04e-50
exclusion_chr4:118278792    chr4:118278792-118278943_chr4:118278792-118279106   0.085   0.085   0.085   0.346   0.346   0.346   239.29  1.04e-50
inclusion_chr3:170868297    chr3:170868297-170868201_chr3:170868297-170868134   0.183   0.183   0.183   0.395   0.395   0.395   239.3   1.04e-50
inclusion_chr4:118278792    chr4:118278792-118278943_chr4:118278792-118279106   0.915   0.915   0.915   0.654   0.654   0.654   239.29  1.04e-50

Is there a way to convert the coordinates to gene or transcript Id or add the isoform_ids ?

Jeltje commented 1 year ago

The diffsplice.<alt3|alt5|es|ir>.events.quant.tsv files show which transcripts and genes contain the splice. I agree that ideally the gene should be listed in the drimseq output file but that's quite a bit of work. I will open a feature request for that.