chrisquince / STRONG

Strain Resolution ON Graphs
MIT License
44 stars 8 forks source link

metabat2 binning #117

Open ebustos128 opened 2 years ago

ebustos128 commented 2 years ago

Hi, I'm trying to do a binning with metabat2 using the following config.yaml:

------ Samples ------

samples: '' # specify a list samples to use or '' to use all samples

------ Resources ------

threads : 16 # single task nb threads

------ Assembly parameters ------

data: /home/ebustos/05.PATOS_metagenomes/23.STRONG_Runs/04.Tomeu_metagenomes/02.STRONG_samples # path to data folder

----- Annotation database -----

cog_database: /home/ebustos/05.PATOS_metagenomes/23.STRONG_Runs/cogs/Cog # COG database

----- Binner ------

binner: metabat2

----- Binning parameters ------

contig_size: 1500

read_length: 150 assembly: assembler: spades k: [77] mem: 200000 threads: 16

----- BayesPaths parameters ------

bayespaths: nb_strains: 16 nmf_runs: 1 max_giter: 1 min_orf_number_to_merge_bins: 10 min_orf_number_to_run_a_bin: 10 percent_unitigs_shared: 0.1

----- DESMAN parameters ------

desman: execution: 1 nb_haplotypes: 10 nb_repeat: 5 min_cov: 1

----- Evaluation ------

evaluation:

    #execution: 1
# genomes: /home/ebustos/05.PATOS_metagenomes/23.STRONG_Runs/01.PATOS_samples/01.Pond1/01.Cycle1/Eval # path to reference genomes

Do you think that this config.yaml is correct? Also, I have checked the snake files and I haven't see any option to make the binning with metabat2.

Best, Esteban

Sebastien-Raguideau commented 2 years ago

Hi Esteban, I admit it's a bit hard to say anything right there. Can you instead share your config as a file? Some important features of .yaml are indentations and spaces after colon. Try a yaml file validator to check your file is a correct yaml. Usually, the simplest way is to just take the template config file and complete it. Which you seem to have done, so it should be fine. Also, a good way to check if the config file is valid, is to try and launch STRONG with it, that would be quite fast. Regarding metabat2, ... what options were you looking for. Are you saying that you didn't find the part of the code were metabat2 is used or that you would like to be able to run metabat2 with some options?

ebustos128 commented 2 years ago

Hi, I have using a old version of STRONG, now I'm running the STRONG with metabat2 binning but I have the following issue:

[Sun Oct 10 14:25:51 2021] Error in rule create_bin_folders: jobid: 156 output: binning/metabat2/list_mags.tsv, binning/metabat2/SCG_table_metabat2.csv shell: /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scripts/SCG_in_Bins.py binning/metabat2/clustering_metabat2.csv annotation/SCG.fna annotation/assembly.bed profile/split.bed /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scg_data/scg_cogs_to_run.txt -all subgraphs/bin_init/ -l binning/metabat2/list_mags.tsv -T 0.75 -t binning/metabat2/SCG_table_metabat2.csv (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Full Traceback (most recent call last): File "/home/c988/c9881009/.conda/envs/mamba/envs/STRONG/lib/python3.6/site-packages/snakemake/executors/init.py", line 2395, in run_wrapper basedir, File "/gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/Binning.snake", line 266, in __rule_create_bin_folders File "/home/c988/c9881009/.conda/envs/mamba/envs/STRONG/lib/python3.6/site-packages/snakemake/shell.py", line 263, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scripts/SCG_in_Bins.py binning/metabat2/clustering_metabat2.csv annotation/SCG.fna annotation/assembly.bed profile/split.bed /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scg_data/scg_cogs_to_run.txt -all subgraphs/bin_init/ -l binning/metabat2/list_mags.tsv -T 0.75 -t binning/metabat2/SCG_table_metabat2.csv' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/c988/c9881009/.conda/envs/mamba/envs/STRONG/lib/python3.6/site-packages/snakemake/executors/init.py", line 592, in _callback raise ex File "/home/c988/c9881009/.conda/envs/mamba/envs/STRONG/lib/python3.6/concurrent/futures/thread.py", line 56, in run result = self.fn(*self.args, *self.kwargs) File "/home/c988/c9881009/.conda/envs/mamba/envs/STRONG/lib/python3.6/site-packages/snakemake/executors/init.py", line 578, in cached_or_run run_func(args) File "/home/c988/c9881009/.conda/envs/mamba/envs/STRONG/lib/python3.6/site-packages/snakemake/executors/init.py", line 2407, in run_wrapper ex, lineno, linemaps=linemaps, snakefile=file, show_traceback=True snakemake.exceptions.RuleException: CalledProcessError in line 115 of /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/Binning.snake: Command '/gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scripts/SCG_in_Bins.py binning/metabat2/clustering_metabat2.csv annotation/SCG.fna annotation/assembly.bed profile/split.bed /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scg_data/scg_cogs_to_run.txt -all subgraphs/bin_init/ -l binning/metabat2/list_mags.tsv -T 0.75 -t binning/metabat2/SCG_table_metabat2.csv' returned non-zero exit status 1. File "/gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/Binning.snake", line 115, in __rule_create_bin_folders

RuleException: CalledProcessError in line 115 of /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/Binning.snake: Command '/gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scripts/SCG_in_Bins.py binning/metabat2/clustering_metabat2.csv annotation/SCG.fna annotation/assembly.bed profile/split.bed /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scg_data/scg_cogs_to_run.txt -all subgraphs/bin_init/ -l binning/metabat2/list_mags.tsv -T 0.75 -t binning/metabat2/SCG_table_metabat2.csv' returned non-zero exit status 1. File "/gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/Binning.snake", line 115, in __rule_create_bin_folders File "/home/c988/c9881009/.conda/envs/mamba/envs/STRONG/lib/python3.6/concurrent/futures/thread.py", line 56, in run Job failed, going on with independent jobs. Exiting because a job execution failed. Look above for error message Complete log: /gpfs/gpfs1/scratch/c9881009/projects/PATOS/02.Trimmed_files/05.PATOS_STRONG_results/.snakemake/log/2021-10-09T103712.730549.snakemake.log unlocking removing lock removing lock removed all locks

Seems to fail when I run with metabat2 binning because using concoct all pipeline is working fine! Best, Esteban

Sebastien-Raguideau commented 2 years ago

Hi Esteban,

Sorry for delay, I was away for a bit.

Yeah, so this is a downside of snakemake, that log doesn't tell us why it failed. There is no specific log corresponding to that script so the best way do debug it would be for you to rerun the failing command out of STRONG. That would be: /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scripts/SCG_in_Bins.py binning/metabat2/clustering_metabat2.csv annotation/SCG.fna annotation/assembly.bed profile/split.bed /gpfs/gpfs1/scratch/c9881009/apps/STRONG/SnakeNest/scg_data/scg_cogs_to_run.txt -all subgraphs/bin_init/ -l binning/metabat2/list_mags.tsv -T 0.75 -t binning/metabat2/SCG_table_metabat2.csv I predict that this will be an issue with 1 of the file it is using. I'm unclear if that script works when any of corresponding file is empty.

Also obligatory pebkac question, did you use the same output folder for both concoct and metabat2 runs? I don't think this is clearly specified in the doc, but you should not do so. That would be quite problematic for downstream analysis. Though it is possible to symlink assembly and profile folder in the other STRONG folder for speed up.

Best, Seb