Xinglab / IRIS

IRIS: Isoform peptides from RNA splicing for Immunotherapy target Screening
Other
24 stars 9 forks source link

Run the complete pipeline #17

Open serpei opened 1 year ago

serpei commented 1 year ago

Hi, thank you for your super-interesting tool.

I have some questions about the snakemake_config.yaml (If I plan to run all pipeline together using snakemake)

  1. This file has to stay in the IRIS folder? So, if I want to run IRIS on multiple batches of patients I cannot do this in parallel?
  2. If I want to run all steps to define the degree of association I only have to set run_all_modules at "true" and insert a list of normals in tissue_matched_normal_reference_group_names and tumors in tumor_reference_group_names?
  3. Which are the differences between tissue_matched_normal_reference_group_names and normal_reference_group_names?
  4. Can the user add its normal controls? And, if yes, how?
  5. What's the meaning of "blocklist"?
  6. What does mean comparison_mode "group" or "individual"?
  7. For which kind of analyses you suggest stat_test_type parametric or non parametric?
  8. Has sample_fastqs a maximum?
  9. Are novel events considered automatically?
  10. The parameter splice_event_type: has a user to run separately each event type?

Sorry to bother you, thank you, Serena

EricKutschera commented 1 year ago
  1. The snakemake_config.yaml is intended to stay in the IRIS/ folder. If you want to run IRIS multiple times in parallel you could create a separate install of IRIS in a different folder. I think another option is to just put a new copy of the IRIS code in a new folder and then set these config paths to point to the main installed version of IRIS: https://github.com/Xinglab/IRIS/blob/v2.0.1/snakemake_config.yaml#L105 If you have multiple runs at the same time then they should have different run_name config values

  2. To run all steps you should set run_all_modules: true. If you want to use tissue_matched_normal or tumor references then you should fill out all the config values for that reference group: https://github.com/Xinglab/IRIS/blob/v2.0.1/snakemake_config.yaml#L61 At least one of tissue_matched_normal or normal is required

  3. There are separate output files for the comparison against the tissue_matched_normal (tier 1) and the comparison against all 3 of tissue_matched_normal, tumor, and normal (tier 3): https://github.com/Xinglab/IRIS/tree/v2.0.1#example-output

  4. When you run the pipeline it will add a new directory to IRIS_data/db using the run_name from the config. After running the pipeline you can then use the results in a future run as one of the reference_group_names

  5. From https://github.com/Xinglab/IRIS/blob/v2.0.1/example/parameter_file_description.txt#L40

    Removes the AS events that are error-prone due to artifacts

Here's the example file https://github.com/Xinglab/IRIS/blob/v2.0.1/IRIS/data/blocklist.brain_2020.txt

  1. and 7. From https://github.com/Xinglab/IRIS/blob/v2.0.1/example/parameter_file_description.txt#L36

    Comparison mode & statistical test type: 'group' mode (number of input samples >=2) and 'individual mode' (number of input sample =1) are provided. 'group' mode is default and recommended; for PSI-based tests, 'parametric' and 'nonparametric' tests are supported. 'parametric' is default

  2. No maximum

  3. The snakemake does not use the novelSS parameter

  4. The snakemake only supports one event type at a time. If you want output for each event type then you need to run multiple times

serpei commented 1 year ago

Thank you so much for your explanations! Given that I would like to consider all events types and also novelSS I think I have to use the single functions to build a pipeline. Do you agree? Given so, the single functions support multithreading and I can fix number of core to be using in each of them? Thank you again, Serena

EricKutschera commented 1 year ago

Building your own pipeline from the individual functions is reasonable. Depending on how much you want to change, you could try modifying the provided snakemake workflow instead of building a new pipeline from scratch

The single functions use multithreading, but they don't take the number of cores as a parameter. For example: https://github.com/Xinglab/IRIS/blob/v2.0.1/snakemake_config.yaml#L7 https://github.com/Xinglab/IRIS/blob/v2.0.1/IRIS/IRIS_process_rnaseq.py#L13 https://github.com/Xinglab/IRIS/blob/v2.0.1/IRIS/IRIS_process_rnaseq.py#L27

You can edit the files to change the number of threads

serpei commented 10 months ago

Excuse me for the late response. Thank you again!