Closed MustafaElshani closed 1 year ago
Please have a look at the last line of test.diffsplice.stderr.txt
in your output directory.
It may be that there are not enough genes left after filtering, in which case DRIMSeq cannot run.
We're working on improving the error messaging.
If this answers your question, please close this ticket. If not, please paste the output of test.diffsplice.stderr.txt
in a comment. Thanks!
Hi Jeltje
Indeed that seems to be the the message !No genes left after filtering!
What does this mean?
R[write to console]: Error in dmDS_filter(counts = x@counts, min_samps_gene_expr = min_samps_gene_expr, :
!No genes left after filtering!
R[write to console]: In addition:
R[write to console]: There were 34 warnings (use warnings() to see them)
R[write to console]:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/flair/runDS.py", line 216, in <module>
main()
File "/usr/local/lib/python3.10/dist-packages/flair/runDS.py", line 185, in main
R('filtered <- dmFilter(data, min_samps_gene_expr = drim1, min_samps_feature_expr = drim2, min_gene_expr = drim3, min_feature_expr = drim4)')
File "/usr/local/lib/python3.10/dist-packages/rpy2/robjects/__init__.py", line 459, in __call__
res = self.eval(p)
File "/usr/local/lib/python3.10/dist-packages/rpy2/robjects/functions.py", line 203, in __call__
return (super(SignatureTranslatedFunction, self)
File "/usr/local/lib/python3.10/dist-packages/rpy2/robjects/functions.py", line 126, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
File "/usr/local/lib/python3.10/dist-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
cdata = function(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/rpy2/rinterface.py", line 815, in __call__
raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in dmDS_filter(counts = x@counts, min_samps_gene_expr = min_samps_gene_expr, :
!No genes left after filtering!
Flair divides your input counts file into different kinds of alternative splicing events, abbreviated alt3, alt5, es (=exon skipping) and as (=other alternative splice). These files are labeled <something>.events.quant.tsv
.
DRIMSeq then filters these to make sure the genes are expressed in at least 6 samples (this is the --drim1
parameter in flair diffSplice and min_samps_gene_expr
in your error message Error in dmDS_filter
You can change --drim1
, but it is already as low as it can reasonably be: In any differential expression analysis you should have at least two groups that have at least three samples each. So that's six samples.
It is also possible that one of your quant
files is empty (you can check if they have just a header line)
I am working on the error messaging for flair diffExp
. If you want, you can drop in my latest changes to your install:
wget https://raw.githubusercontent.com/BrooksLabUCSC/flair/master/src/flair/runDS.py
wget https://raw.githubusercontent.com/BrooksLabUCSC/flair/master/src/flair/flair.py
mv runDS.py flair.py /usr/local/lib/python3.10/dist-packages/flair/
and run again to get more detailed info.
diffSplice has now run fully without any errors the following output generated
-rw-r--r-- 1 root root 1.9M Sep 17 04:39 .alt3.Control_v_Treated_drimseq_results.tsv
-rw-r--r-- 1 root root 5.4M Sep 16 19:13 .alt3.events.quant.tsv
-rw-r--r-- 1 root root 2.1M Sep 17 04:22 .alt5.Control_v_Treated_drimseq_results.tsv
-rw-r--r-- 1 root root 5.2M Sep 16 19:13 .alt5.events.quant.tsv
-rw-r--r-- 1 root root 14M Sep 17 04:02 .es.Control_v_Treated_drimseq_results.tsv
-rw-r--r-- 1 root root 68M Sep 16 19:40 .es.events.quant.tsv
-rw-r--r-- 1 root root 7.5M Sep 17 07:16 .ir.Control_v_Treated_drimseq_results.tsv
-rw-r--r-- 1 root root 28M Sep 16 19:39 .ir.events.quant.tsv
-rw-r--r-- 1 root root 4.1K Sep 17 07:16 .stderr.txt
However all files remain hidden
In addition, *drimseq_results.tsv
files do not seem to contain gene_id/gene_names instead they have coordinates in column 2 where it's labelled gene_id
. Are there meant to be gene_id in there?
Can you please give us the exact command you are running? I can't reproduce your problem.
And yes, you are testing specific splices instead of transcripts; the coordinates tell you which. The *quant.tsv files give information on wich transcripts contain the splice.
Hi Jeltje
I have run the following command
docker run -v
pwd:/mydata -w /mydata brookslab/flair:latest flair diffSplice --test -i analysis/flair/flair_collapse/OCIAML3_collapsed.isoforms.bed -q analysis/flair/flair_quantify/counts_matrix.tsv -o analysis/flair/flair_diffSplice/
so the file remain hidden .*
head .alt3.Control_v_Treated_drimseq_results.tsv
feature_id gene_id sampleR1_Control_batch1 sampleR2_Control_batch1 sampleR3_Control_batch1 sampleR4_Treated_batch1 sampleR5_Treated_batch1 sampleR6_Treated_batch1 lr df pvalue adj_pvalue
1 exclusion_chr10:100261978 chr10:100261978-100261044_chr10:100261978-100261028 0.09138836636671532 0.09138836636671532 0.09138836636671532 0.13740030161765793 0.13740030161765793 0.13740030161765793 1.5286755647020982 1.0 0.21631132773601214 0.8631461249329633
2 exclusion_chr10:100989453 chr10:100989453-100989643_chr10:100989453-100989705 0.152504350974627 0.152504350974627 0.152504350974627 0.18984802049987926 0.18984802049987926 0.18984802049987926 0.8459713032202671 1.0 0.3576945405229909 0.9352895759446236
3 exclusion_chr10:101035667 chr10:101035667-101035943_chr10:101035667-101036002 0.9243412590019962 0.9243412590019962 0.9243412590019962 0.905841318091491 0.905841318091491 0.905841318091491 0.35061746540168315 1.0 0.5537638058552774 0.9761679361036135
4 exclusion_chr10:101036561 chr10:101036561-101036666_chr10:101036561-101036722 0.9391810050648027 0.9391810050648027 0.9391810050648027 0.9398552913943664 0.9398552913943664 0.9398552913943664 0.0012280749679121072 1.0 0.9720447347536467 0.9981867626785553
We just did a new release. Can you do a docker pull brookslab/flair:1.7.0
and try again?
Dear Jeltje
The command run successfully with all files as expected
-rw-r--r-- 1 root root 7.5M Oct 29 13:48 diffsplice.alt3.events.quant.tsv
-rw-r--r-- 1 root root 7.1M Oct 29 13:48 diffsplice.alt5.events.quant.tsv
-rw-r--r-- 1 root root 68M Oct 29 14:13 diffsplice.es.events.quant.tsv
-rw-r--r-- 1 root root 40M Oct 29 14:13 diffsplice.ir.events.quant.tsv
-rw-r--r-- 1 root root 15K Oct 29 16:54 drimseq_alt3_Control_v_Treated.tsv
-rw-r--r-- 1 root root 21K Oct 29 16:48 drimseq_alt5_Control_v_Treated.tsv
-rw-r--r-- 1 root root 118K Oct 29 16:41 drimseq_es_Control_v_Treated.tsv
-rw-r--r-- 1 root root 65K Oct 29 17:47 drimseq_ir_Control_v_Treated.tsv
drwx------ 2 root root 4.0K Oct 29 17:47 workdir
However I'm still confused on the results as the second column suggest there should be gene_id
however I think its the coordinates
head drimseq_alt3_Control_v_Treated.tsv
feature_id gene_id sampleR1_Control_batch1 sampleR2_Control_batch1 sampleR3_Control_batch1 sampleR4_Treated_batch1 sampleR5_Treated_batch1 sampleR6_Treated_batch1 lr adj_pvalue
exclusion_chr3:170868297 chr3:170868297-170868201_chr3:170868297-170868134 0.817 0.817 0.817 0.605 0.605 0.605 239.3 1.04e-50
exclusion_chr4:118278792 chr4:118278792-118278943_chr4:118278792-118279106 0.085 0.085 0.085 0.346 0.346 0.346 239.29 1.04e-50
inclusion_chr3:170868297 chr3:170868297-170868201_chr3:170868297-170868134 0.183 0.183 0.183 0.395 0.395 0.395 239.3 1.04e-50
inclusion_chr4:118278792 chr4:118278792-118278943_chr4:118278792-118279106 0.915 0.915 0.915 0.654 0.654 0.654 239.29 1.04e-50
Is there a way to convert the coordinates to gene or transcript Id or add the isoform_ids ?
The diffsplice.<alt3|alt5|es|ir>.events.quant.tsv files show which transcripts and genes contain the splice. I agree that ideally the gene should be listed in the drimseq output file but that's quite a bit of work. I will open a feature request for that.
This was run using the latest docker run, the error appeared when I used the
--test
Probably an R package issue, any help appreciated
Mustafa