comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
258 stars 61 forks source link

Some questions #103

Open stephanflemming opened 3 years ago

stephanflemming commented 3 years ago

Hi,

I am working on a Galaxy wrapper to include SUPPA in our ToolShed. The PR can be found here. More information about Galaxy and the ToolShed can be found here.

I have some questions and a couple of remarks (from a programmers point of view, I've never used SUPPA for an analysis).

1) The wiki has an example for clusterEvents with the parameter --separation 0.11

python ~/SUPPA/suppa.py clusterEvents --dpsi ~/additional_files/Busskamp_multicondition.SE.dpsi --psivec ~/additional_files/Busskamp_multicondition.SE.psivec --sig-threshold 0.05 --eps 0.2 --separation 0.11 -dt 0.2 --min-pts 10 --groups 1-3,4-6,7-9,10-12 -c OPTICS -o ~/Busskamp_OPTICS

But in the docs, it is mentioned, that this option is required for "OPTICS method". I am confused, because DBSCAN is the default value of -c. So why is -s applied?

2) readme There is a typo "... Only used forlocal AS ..." and a word missing "... which contains the for each event the transcripts ...".

3) The wiki explains a step, that merges IOE files. I added that to the wrapper. Is it recommended to merge GTF files too?

4) The header of the result of joinFiles contains the name of the input datasets. Galaxy does not work with filenames, but uses meta descriptions instead. Therefore the header looks like dataset_10_SRR1513332 dataset_10_SRR1513333 dataset_10_SRR1513334. May I ask for a parameter to add (optional) descriptions/names for each input file that will be merged? :-)

5) May I ask for a paramter --version that puts out the current version of SUPPA? :-)

6) Some parameters have integer default values. The description sounds more like float to me. Am I correct? (I have to use the correct type in the wrapper.) diffSplice: -l, -nan clusterEvents: -s generateEvents: -t (what does 10nt mean?),

7) The wiki describes "An ioi/ioe file and a "transcript expression file" are required as input.". Shouldn't it be "gtf/ioe"?

Thanks in advance, Stephan

EduEyras commented 3 years ago

Hi Stephan,

Thanks a lot for adding SUPPA to the galaxy toolshed and for your comments and suggestions. I'll go through them with JL and JC, who developed SUPPA and the wiki page. I hope we can respond back soon with fixes and details.

cheers

Eduardo

On Thu, 22 Oct 2020 at 12:05, Stephan Flemming notifications@github.com wrote:

Hi,

I am working on a Galaxy https://usegalaxy.eu/ wrapper to include SUPPA in our ToolShed https://toolshed.g2.bx.psu.edu/. The PR can be found here https://github.com/galaxyproject/tools-iuc/pull/2850. More information about Galaxy and the ToolShed can be found here https://galaxyproject.org/toolshed/.

I have some questions and a couple of remarks (from a programmers point of view, I've never used SUPPA for an analysis).

1.

The wiki has an example for clusterEvents with the parameter --separation 0.11

python ~/SUPPA/suppa.py clusterEvents --dpsi ~/additional_files/Busskamp_multicondition.SE.dpsi --psivec ~/additional_files/Busskamp_multicondition.SE.psivec --sig-threshold 0.05 --eps 0.2 --separation 0.11 -dt 0.2 --min-pts 10 --groups 1-3,4-6,7-9,10-12 -c OPTICS -o ~/Busskamp_OPTICS

But in the docs, it is mentioned, that this option is required for "OPTICS method". I am confused, because DBSCAN is the default value of -c. So why is -s applied?

1.

readme There is a typo "... Only used forlocal AS ..." and a word missing "... which contains the for each event the transcripts ...". 2.

The wiki https://github.com/comprna/SUPPA/wiki/SUPPA2-tutorial#event-calculation explains a step, that merges IOE files. I added that to the wrapper. Is it recommended to merge GTF files too?

1.

The header of the result of joinFiles contains the name of the input datasets. Galaxy does not work with filenames, but uses meta descriptions instead. Therefore the header looks like dataset_10_SRR1513332 dataset_10_SRR1513333 dataset_10_SRR1513334. May I ask for a parameter to add (optional) descriptions/names for each input file that will be merged? :-)

1.

May I ask for a paramter --version that puts out the current version of SUPPA? :-)

1.

Some parameters have integer default values. The description sounds more like float to me. Am I correct? (I have to use the correct type in the wrapper.) diffSplice: -l, -nan clusterEvents: -s generateEvents: -t (what does 10nt mean?),

1.

The wiki https://github.com/comprna/SUPPA#input-files-1 describes "An ioi/ioe file and a "transcript expression file" are required as input.". Shouldn't it be "gtf/ioe"?

Thanks in advance, Stephan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/103, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB7T4FSEFY6BCCFTKGLSL6AOHANCNFSM4S2P6Z2Q .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

stephanflemming commented 3 years ago

Hi Eduardo,

do you need more information from my side? I want to close the PR and add the wrapper. I just need the answers for questions 1,3 and 6 :-)

Thanks in advance, Stephan

EduEyras commented 3 years ago

Hi Stephan,

apologies for the long delay in replying

We are really glad that you want to include SUPPA wrapper in galaxy.

Regarding your questions:

1.

The wiki has an example for clusterEvents with the parameter --separation 0.11

python ~/SUPPA/suppa.py clusterEvents --dpsi ~/additional_files/Busskamp_multicondition.SE.dpsi --psivec ~/additional_files/Busskamp_multicondition.SE.psivec --sig-threshold 0.05 --eps 0.2 --separation 0.11 -dt 0.2 --min-pts 10 --groups 1-3,4-6,7-9,10-12 -c OPTICS -o ~/Busskamp_OPTICS

But in the docs, it is mentioned, that this option is required for "OPTICS method". I am confused, because DBSCAN is the default value of -c. So why is -s applied?

The option is used when -c OPTICS is used, as in the example. This is described like that in the README. I am not sure if this clarifies it.

I cc Juanlu, who developed the clustering part and he can explain this better.

1.

readme There is a typo "... Only used forlocal AS ..." and a word missing "... which contains the for each event the transcripts ...".

Thanks!

  1. \

The wiki https://github.com/comprna/SUPPA/wiki/SUPPA2-tutorial#event-calculation explains a step, that merges IOE files. I added that to the wrapper. Is it recommended to merge GTF files too?

Yes, the GTF of the events are for visualisation, so one could as well merge them and visualise them all together.

1.

The header of the result of joinFiles contains the name of the input datasets. Galaxy does not work with filenames, but uses meta descriptions instead. Therefore the header looks like dataset_10_SRR1513332 dataset_10_SRR1513333 dataset_10_SRR1513334. May I ask for a parameter to add (optional) descriptions/names for each input file that will be merged? :-)

Not sure I understand this one. Do you mean to be able to modify the headers of the file in the command line? This would be for the first step of the PSI calculation I guess?

1.

May I ask for a paramter --version that puts out the current version of SUPPA? :-)

Yes, we will add that.

1.

Some parameters have integer default values. The description sounds more like float to me. Am I correct? (I have to use the correct type in the wrapper.) diffSplice: -l, -nan clusterEvents: -s

You are right these are actually floats.

generateEvents: -t (what does 10nt mean?),

-t is in this case an integer, because it is a distance in units of number of nucleotides. This value is used as the margin allowed to match two exon-intron boundaries from different transcripts, i.e. a fuzzy match, where the maximum variability allowed is -t, but it can only vary in integer steps.

1.

The wiki https://github.com/comprna/SUPPA#input-files-1 describes "An ioi/ioe file and a "transcript expression file" are required as input.". Shouldn't it be "gtf/ioe"?

To calculate the PSI per local event, the input is an ioe and a transcript expression file

To calculate the PSI per isoform, the input is indeed a gtf file and a transcript expression file. This is actually something to be fixed, because we could generate first the ioi file, and then use the ioi and expression file to calculate the PSI per isoform. This will make it more symmetric.

I hope this helps

Please let us know if there is anything else to clarify

Thanks again

Eduardo

Thanks in advance,

Stephan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/103, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB7T4FSEFY6BCCFTKGLSL6AOHANCNFSM4S2P6Z2Q .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ