chrisquince / STRONG

Strain Resolution ON Graphs
MIT License
44 stars 9 forks source link

STRONG Hanging indefinitely on coverage rule #125

Closed rhysnewell closed 2 years ago

rhysnewell commented 2 years ago

Hi there,

I've been trying to get STRONG to run on my universities HPC for awhile now. Installation has been pretty painful without a conda or working docker container to ease the process. It seems we might finally have created a functioning docker container, but when we go to run STRONG it is hanging forever on the first step:

Selected jobs (1):
        coverage
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

[Thu Feb 17 16:19:11 2022]
rule coverage:
    output: profile/split/coverage.tsv
    jobid: 1
    reason: Missing output files: profile/split/coverage.tsv
    resources: tmpdir=/tmp

            echo -e "contig     ""$(ls -U  | cut -f1 -d "." | rev | cut -f1 -d "/" |rev | tr "
" "     " | sed 's/     $//')"> profile/split/coverage.tsv
            awk 'NR==FNR{Matrix_coverage[1,FNR]=$4}FNR==1{f++}{Matrix_coverage[f+1,FNR]=$5}END{for(x=1;x<=FNR;x++){for(y=1;y<ARGC+1;y++){if(y<ARGC){printf("%s  ",Matrix_coverage[y,x])}if(y==ARGC){printf("%s",Matrix_coverage[y,x]);print""}}}}'  >>profile/split/coverage.tsv

Note, this trying to run STRONG on the synthetic samples that were used in your paper. So it should work in theory. Here is my config file:

# ------ Samples ------
samples: ['sample*'] # specify a list samples to use or '*' to use all samples

# ------ Resources ------
threads : 32 # single task nb threads

# ------ Assembly parameters ------
data:  ./  # path to data folder

# ----- Annotation database -----
cog_database: /home/n10853499/databases/cogs # COG database

# ----- Binner ------
binner: "metabat2"

# ----- Binning parameters ------
concoct:
    contig_size: 1500

read_length: 150
assembly:
    assembler: spades
    k: [77]
    mem: 2000
    threads: 24

# ----- BayesPaths parameters ------
bayespaths:
    nb_strains: 5
    nmf_runs: 1
    max_giter: 1
    min_orf_number_to_merge_bins: 18
    min_orf_number_to_run_a_bin: 10
    percent_unitigs_shared: 0.1

# ----- DESMAN parameters ------
desman:
    execution: 1
    nb_haplotypes: 10
    nb_repeat: 5
    min_cov: 1

# -----  Evaluation ------
evaluation:
    execution: 1
    genomes: "/lustre/scratch/microbiome/n10853499/02-lorikeet_testing/00-STRONG_sim/Synth_G45_S15D/Eval/Genomes"

And here is the singularity/docker command I'm trying to use: singularity run library://biocontainers/strongdocker:0.1.1 STRONG --config ~/test_config.yaml --verbose ~/strong_out Nothing fancy, but it should at least make it past the first step.

Is there something I'm missing here? Is there an easier method to get an install working, or do you have any intention to containerize STRONG in a managable way?

Cheers, Rhys

chrisquince commented 2 years ago

Hi Rhys,

Thanks for generating a docker container. I am not very knowledgeable regarding docker can you point me at a getting started tutorial and I will try to install your container?

The coverage profiles are not the first step prior to that assembly and graph processing should have been performed my guess is that it is these steps that have failed. Did you run the install script required to compile the special SPADES executables?

It is nice to see that people are trying to use STRONG. We are not as you have probably gathered software engineers. What we are considering is refactoring and simplifying the pipeline and also allowing the use of predefined MAGs. It is very hard to find the spare time to do these things though.

Thanks, Chris

rhysnewell commented 2 years ago

Okay, seems like the docker container requires a bit more work. The spades install must have failed silently at some point which is odd. Unfortunately, I wasn't the one who generated it so I don't have definitive answers on how you do generate a docker image. I do know that there is a publicly available one available on dockerhub: https://hub.docker.com/r/yuxiangtan/strong The person who generated that docker image seemed to get it mostly running after some back and forth: https://github.com/chrisquince/STRONG/issues/122. But not sure how. I've asked them if they are willing to share an updated version of their docker image, so we shall see.

Yeah, the pipeline needs refactoring. I realise that that is a lot of work, but in its current state - I'm sorry to say - it is just too difficult to reliably get running for new users. Before you start adding new features, I think you should get your install procedure streamlined. That would be the thing that is most greatly appreciated by future users