aehrc / isling

A tool for detection of viral integrations
5 stars 1 forks source link

how do you run isling?? #8

Closed JHBI115 closed 1 year ago

JHBI115 commented 1 year ago

Hi , Sorry, but I'm not familiar with snakemake (but I'm using a server that has it installed) or docker. And I dont really undestand from documentation how do you run isling? is intvi_pipeline supposed to be my local directory or comes with the isling package ?

Could you please clarify ?

thank you so much!

szsctt commented 1 year ago

Thanks for your interest in isling.

Have you been able to run isling with the test data?

As well as snakemake, you will need either conda, mamba or singularity installed to supply isling's other dependencies.

To run with the test dataset, all you need to do is clone the repo and run snakemake.

If you have conda:

git clone https://github.com/aehrc/isling.git && cd isling
snakemake --configfile test/config/test.yml --cores 1 --use-conda --conda-frontend conda

If you have mamba:

git clone https://github.com/aehrc/isling.git && cd isling
snakemake --configfile test/config/test.yml --cores 1 --use-conda 

If you have singularity:

git clone https://github.com/aehrc/isling.git && cd isling
snakemake --configfile test/config/test.yml --cores 1 --use-singularity

To run with your own data, just modify the config file to point at your files, and run the same way. You can also increase the number of cores if your dataset is large.

The intvi_pipeline referred to was the name for isling during development - I've replaced this in the docs.

JHBI115 commented 1 year ago

Hi , thank you so much for your fast reply. I did run the test and apparently it worked. However, when I try to run on my own samples, in the out file I found only 3 directories: F_Flx_1/dedup_reads, summary and references

looking at standard error I found this message:

Error in rule dedupe: jobid: 68 output: /scratch16/..../dedup_reads/F_Flx_1_1.fq, /scratch16/.../dedup_reads/F_Flx_1_2.fq conda-env: /scratch16/..../A/isling/.snakemake/conda/ec6e8117eb06eedb3edaed563b48fb68 shell:

            clumpify.sh -Xmx55426m in1=/scratch16/.../A/CleanData/F_Flx_1_R1.fq.gz in2=/scratch16/..../A/CleanData/F_Flx_1_R2.fq.gz out1=/scratch16/.../A/Isling_OUT/F_Flx_1/dedup_reads/F_Flx_1_1.fq out2=/scratch16/...A/Isling_OUT/F_Flx_1/dedup_reads/F_Flx_1_2.fq dedupe=t ac=f subs=2 threads=4

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Although the run was not interrupted by this error , it did not generated any further output. I'm running this on a grid in a sbatch script.

Thank you !

szsctt commented 1 year ago

Glad that you could run the test data ok.

Was there no other error from this rule? Sometimes the actual error message is a little above or below this message saying that a rule has failed.

Depending on the size of your input files, this could be a memory issue - I'd suggest trying again and asking for more memory in your sbatch script.

JHBI115 commented 1 year ago

Hi,

Yes, you are right. It was an oom kill event.

It was using 4 cores 64GB mem , config file was isling/config.yml instead of isling/test/config/test.yml and the script was launched from outside the isling directory .

Last night I started another run. The difference was : run with 1 core, using isling/test/config/test.yml (modified the paths for reads and output), and running from inside isling dir. (It was killed because of time limit, but definitely passed that point where it broke the first time. So I'm thinking , considering the mem was the same , maybe it was also something else?)

My files are pretty big 21G - 300 million pairs (WGS-mouse). I think should break the files in smaller parts !?

Thank you for your continuous support!


UPDATE: the run finished without errors, but no integrations also. I have 2 virus sequences (84 and 115bp). virus_stats file : raw total sequences: 465295159 reads mapped: 206 reads mapped and paired: 4 then in host_stats it says: raw total sequences: 371 reads mapped: 371 reads mapped and paired: 334 but no integrations and unfortunately no bam files were saved.

szsctt commented 1 year ago

Glad that you got it working. Isling also has a split option (specify in the config file) to split large files into pieces.

By default isling doesn't keep the sam/bam files to save disk space. If you would like to keep them, you can use the --notemp snakemake option but you would have to re-run the workflow.

Reads are first aligned to the virus, and only mapped reads, and pairs where one read is mapped and the other unmapped, are then aligned to the host. So in this case, it doesn't look like you have many reads aligned to your viral references, which is probably why you don't have any integration sites.