Closed B-1991-ing closed 2 years ago
Can we write info of multiple samples in to one config.yaml file? Or we can only write one sample's information into one config file?
I tried to install the miniconda three times, always got no space error, although I already specified my PROJECT directory.
Would that not mean that your project folder is completely full? Maybe an admin or colleague can help you figuring out whats wrong?
Is any way we just run the binny script and directly load the necessary modules - snakemake, prokka, and mantis, etc, instead of creating different conda environments?
Only by basically rewriting everything.
I would then instead try loading the latest conda module again, install binny to your project dir, and have binny load that same module by specifying it in the VARIABLE_CONFIG
. But, if the project dir is full, it would of course still not work.
Can we write info of multiple samples in to one config.yaml file?
Only one atm.
Would that not mean that your project folder is completely full? Maybe an admin or colleague can help you figuring out whats wrong?
I already reported the problem to the HPC admin and asked how many TB storage space each HPC user can have, but he only suggested always keep everything in PROJECT dir. It seems like every time, the Miniconda tried to install something to my PERSONAL directory, and lead to the no space error.
I would then instead try loading the latest conda module again, install binny to your project dir, and have binny load that same module by specifying it in the
VARIABLE_CONFIG
. But, if the project dir is full, it would of course still not work.
1st step: git clone https://github.com/a-h-b/binny.git cd binny
2nd step: set latest miniconda in VARIABLE_CONFIG file as shown below
load latest miniconda in binny dir
3rd step: ./binny -i config/config.init.yaml
Then, some error screenshots.
.... But, I didn't use modules - mamba>=0.22.1, conda-build>=3.21.8 and conda>=4.12.0
And you are sure you set the install destination during the install to your project dir (it will ask you at some point where to put it)? It might still try to modify e.g. your .bashrc
etc. to allow functionality. But those changes should take virtually no space and would only fail if your personal dir is 100% full, which should never be the case. If so, you should definitely delete some unused stuff there, as it might lead to lots of problems in general.
Then, some error screenshots.
This looks again like this. You can also try to put
pkgs_dirs:
- /home/<user>/.conda/pkgs
in your .condarc
(or create that file, if you dont have it), which seemed to fix it for that guy.
You can also try to run conda clean -a
and see if that helps with your space problems.
And you are sure you set the install destination during the install to your project dir (it will ask you at some point where to put it)?
During the four binny installation steps, I didn't receive any requirement to ask me the final destination.
I meant for miniconda
I meant for miniconda
You mean due to my PERSONAL dir is full, so even if I loaded the latest miniconda to install the binny, it still can't the FULL folder? I cleaned 1 GB conda packages use "conda clean -a", which I really don't know, where they are from... Thank you very much.
It takes 30 mins to finish the command line: "./binny -i config/config.init.yaml". But, with one error left as shown below.
When I ran the command line "conda clean -a", I may also removed the "/home/people/binson/.conda/pkgs/infernal-1.1.4-pl5321hec16e2b_1/info/index.json"
But, already some .yaml files generated here.
Does the error "/home/people/binson/.conda/pkgs/infernal-1.1.4-pl5321hec16e2b_1/info/index.json" not existed matter?
During the process of the binny installation, I received two automatic emails from the HPC to remind me already exceed the 10GB limitation.
It already finished creating conda env for ../workflow/envs/mapping.yaml, ../workflow/envs/binny_linux.yaml, ../workflow/envs/mantis.yaml. But error happened when building ../workflow/envs/prokka.yaml due to no space error.
Ok, its probably because we set
pkgs_dirs:
- /home/<user>/.conda/pkgs
in you .condarc
and you have very little space there. If you clean again and change the path to somewhere appropriate in you project folder you should have the space and could try again.
(You could also add envs_dirs:
and a path to /path/to/project/dir/<user>/.conda/envs
, in the same format as pkgs_dirs:
to .condarc
, t ensure all environments that you create without specifying the path, will not be put in your hom.e)
Also, just to avoid unintended interactions, I'd only load miniconda in binny's VARIABLE_CONFIG
, unless you really need the other modules.
In my "/home/people/binson/.condarc" file, only one line - "channel_priority: strict".
Do you mean I can add the /home/projects/env_10000/people/binson/.conda/envs in the .condarc file as shown below?
Yes, e.g. like this:
channel_priority: strict
pkgs_dirs:
- /home/projects/env_10000/people/binson/.conda/pkgs
envs_dirs:
- /home/projects/env_10000/people/binson/.conda/envs
All conda .yaml files are created now, ../workflow/envs/mapping.yaml ../workflow/envs/fasta_processing.yaml ../workflow/envs/prokka.yaml ../workflow/envs/binny_linux.yaml ../workflow/envs/mantis.yaml ../workflow/envs/mantis.yaml
I ran the test - ./binny -l -n "TESTRUN" -r config/config.test.yaml, and got an error from the analysis_checkm_markers.log. analysis_checkm_markers.log
I really didn't install the three databases,
/home/projects/env_10000/people/binson/binny/conda/8875a4cc85bac50feeefc3e8545f0147/lib/python3.9/site-packages/Resources/NCBI/ /data/isbio/TOOLS/binny/database/hmms/checkm_tf/ /data/isbio/TOOLS/binny/database/hmms/checkm_pf/
I have databases /home/projects/env_10000/people/binson/binny/database/hmms/checkm_pf /home/projects/env_10000/people/binson/binny/database/hmms/checkm_tf
All conda .yaml files are created now
These are the supplied environment files. The actual environments are in path/to/binny/conda
(if the default install location was chosen) and are named with a combination of letters and numbers (MD5 hash).
I think, something went wrong during the initial, failed install attempt and Mantis was not installed properly. You can delete the mantis conda env (found like this:
head -n 1 `path/to/binny/conda/*.yaml`
then remove the one with name: mantis
and its yaml file rm -rf path/to/binny/conda/<mantis_environment_hash>*
and run ./binny -i config/config.init.yaml
again.
Im unsure, whats up with /data/isbio/TOOLS/binny/database/...
, since you seem to have them at the correct default location.
I did according to your suggestion as shown below.
Although the ../workflow/envs/mantis.yaml created again, but Total download: 0 B.
You need to delete also the environment folder rm -rf 8875a4cc<...>*
, which will remove the folder and the yaml file.
Yeah, I removed both.
Is it due to bash: hmmpress: command not found...?
Hm, that should not happen. Might be because of the module loading and the fact that you dont have conda. Could you add this line . "$(conda info --base)/etc/profile.d/conda.sh"
after line 90 (eval $LOADING_MODULES
) in binny
and try again?
Finally, it works! Thank you very much for your help and patience!
It takes around 20 minutes to run the test, on an interactive node - 1 node, 1ppn, with 180GB.
One more question, if I have 12 samples, I need to set 12 separate config.yaml files for my 12 samples, then run 12 times separately?
Finally!
It takes around 20 minutes to run the test, on an interactive node - 1 node, 1ppn, with 180GB.
That is very slow, even for 1 core. You can speed it up with -t
using more cores, but maybe the storage is a bit slow too. Either way, glad it works now!
if I have 12 samples, I need to set 12 separate config.yaml files for my 12 samples, then run 12 times separately
Yes, atm it runs per sample. You can create a small wrapper that builds a config file for each sample.
Yes, atm it runs per sample. You can create a small wrapper that builds a config file for each sample.
what is a wrapper?
Its basically a program that calls another program and makes it easier to use in a certain scenario.
E.g. to make some binny config files and run binny with them my_binny_wrapper.sh
:
#! /bin/bash -l
threads=24
path_to_binny="absolute/path/to/binny/binny"
path_to_my_prepared_config_file_template="absolute/path/to/config"
path_to_sample_data="absolute/path/to/data"
path_to_output="absolute/path/to/output"
# Write config file per sample to desired sample output dir and run binny with it
for sample in ${path_to_data}/samples/*; do
path_to_sample_assembly=${sample}/assembly.fasta
path_to_sample_align=${sample}/alignment.bam
sed -e "s|assembly: \"\"|assembly: \"${path_to_sample_assembly}\"|g" \
-e "s|metagenomics_alignment: \"\"|metagenomics_alignment: \"${path_to_sample_align}\"|g" \
-e "s|outputdir: \"output\"|outputdir: \"${path_to_output}/${sample}\"|g" \
${path_to_my_prepared_config_file_template} > ${path_to_output}/configs/${sample}_binny.config.yaml
${path_to_binny} -t ${threads} -c -n "${sample}" -r ${path_to_output}/configs/${sample}_binny.config.yaml
done
Thank you very much for the wrapper script, I will have a try.
I just tried to submit my 15 jobs at their output folders, separately. And, all jobs were stoped at some certain points.
I just take the sample 52 as example.
Could you also post /home/projects/env_10000/people/binson/MTB/binning/binny_52/52_output/logs/analysis_checkm_markers.log
?
But I assume, its because you forgot to include --conda-prefix /aboslute/path/to/binny/conda/environments/dir
in your snakemake call.
At the beginning of the log you can see that it installs all environments again at .snakemake/conda/<env_hash>
(default Snakemake behavior), which doesnt work for Mantis, as it needs setup first. I would also delete .snakemake/conda
because it contains lots of files and takes up quite a bit of space.
/home/projects/env_10000/people/binson/MTB/binning/binny_52/52_output/logs/analysis_checkm_markers.log
?
I already deleted analysis_checkm_markers.log before your comment.
At the beginning of the log you can see that it installs all environments again at
.snakemake/conda/<env_hash>
(default Snakemake behavior), which doesnt work for Mantis, as it needs setup first. I would also delete.snakemake/conda
because it contains lots of files and takes up quite a bit of space.
I added the parameter - --conda-prefix /home/projects/env_10000/people/binson/binny/conda according to your suggestion. And, deleted .snakemake/conda before job submission. At least, everything is fine now.
Thank you very much again.
Ok. You also gave each binny run just one core (snakemake -j 1 ...
). Is that intentional? Unless the samples are very small, Id suggest more, to speed things up but also make sure each rule has enough memory. Depending on how you set normal_mem_per_core_gb:
in you config, Prokka, Mantis or binny might run out of memory.
Ok. You also gave each binny run just one core (
snakemake -j 1 ...
). Is that intentional? Unless the samples are very small, Id suggest more, to speed things up but also make sure each rule has enough memory. Depending on how you setnormal_mem_per_core_gb:
in you config, Prokka, Mantis or binny might run out of memory.
I didn't add thread number in the snakemake command line as shown below, snakemake -j 1 -s /home/projects/env_10000/people/binson/binny/Snakefile --configfile /home/projects/env_10000/people/binson/MTB/binning/binny_52/config52.yaml --use-conda --conda-prefix /home/projects/env_10000/people/binson/binny/conda binny_52.sh.txt
I set big_mem_per_core_gb: 90 and normal_mem_per_core_gb: 8 in the config52.yaml. config52.yaml.txt
I submitted each shell script using the command line as shown below, qsub -W group_list=env_10000 -A env_10000 -l nodes=1:ppn=10:fatnode,mem=1000gb,walltime=48:00:00 /home/projects/env_10000/people/binson/scripts/MTBscripts/binny_52.sh
My jobs running on the HPC cluster like the screenshot now,
I am not sure how long it will take, it exceed 48 hours, I then email the HPC admin to extend later.
Unless the samples are very small
I have five data sets, five datasets
Two datasets with assembly.fa around 30M - 72Mand bam file around 10GB.
Other datasets with assembly.fa around 250M - 500M and bam file around 30GB.
Unless the samples are very small
I have five data sets, five datasets
Two datasets with IDBA-UD assembly.fa around 30M - 72Mand bam file around 10GB.
Other datasets with IDBA-UD assembly.fa around 250M - 500M and bam file around 30GB.
Two datasets with metaSPAdes assembly.fa around 2-4G.
Other datasets with metaSPAdes assembly.fa around 10G.
Ok, not sure if the memory will be enough (with 8gb per core and 1 core), especially for the large one. You can let it run and if something crashes or runs out of time, Id use e.g. 20 cores (qsub ppn=20
, and snakemake -c <all or 20>
(-j
in local mode = -c
) and then normal_mem_per_core_gb:
= 1000/20 = 50 (if you give 1000gb). binny will go on after the last job completed. But im not familiar with your system, so if in doubt id ask the admins again.
It's fast. I firstly submitted 15 jobs and three finished already. No bin output, but I didn't see any error.
I take sample 43 as an example.
binny_43.sh binny_43.sh.txt
binny_43.err binny_43.err.txt
binny_43.log binny_43.log
All log files logs.zip
hm, how does /home/projects/env_10000/people/binson/MTB/binning/binny_43/43_output
look like? can you post /home/projects/env_10000/people/binson/MTB/binning/binny_43/43_output/logs/binning_binny.log
/home/projects/env_10000/people/binson/MTB/binning/binny_43/43_output
It ran fine but did not find any bins. What kind of data are you working with, if I may ask? The N95 is 1179 and theres only 12998 contigs. Could you also link intermediary/annotation_CDS_RNA_hmms_checkm.gff
and intermediary/assembly.contig_depth.txt
?
It ran fine but did not find any bins. What kind of data are you working with, if I may ask?
The 15 sample-dataset I am running now, are isolated bacteria species but not pure. assembly.contig_depth.txt annotation_CDS_RNA_hmms_checkm.gff.txt
I want to keep MAG with the completeness >=50 and contamination <5.
Could you delete the output and rerun e.g. 43 passing --until 'call_contig_depth'
to snakemake and post intermediary/assembly_contig_depth_<number>.txt?
Something is wrong with the depth table, not sure where that comes from.
I submitted the job at 13:30PM, but it was Idle until now. When it finished, I will post here.
I tried to use thinnode - 1 node, 5 ppn, 180GB to run the snakemake command line, snakemake -c 36 -s /home/projects/env_10000/people/binson/binny/Snakefile --configfile /home/projects/env_10000/people/binson/MTB/binning/binny_44/config44.yaml --use-conda --conda-prefix /home/projects/env_10000/people/binson/binny/conda --until 'call_contig_depth'
/home/projects/env_10000/people/binson/MTB/binning/binny_44/44_output/intermediary/assembly_contig_depth_44.txt assembly_contig_depth_44.txt
Thanks.
How did you perform the mapping? The contig names in the depth file seem to differ from the ones in the assembly e.g. instead of contig-100_0
in the assembly, its contig-100_0 length_373393 read_count_4619363
there. Or are the actual contig ids in the assembly for 44 contig-<contig_id> length_<length> read_count_<read_count>
?
Did you try any other binners on those samples and if so, did it work out fine with the same input?
How did you perform the mapping?
Thanks. How did you perform the mapping? The contig names in the depth file seem to differ from the ones in the assembly e.g. instead of
contig-100_0
in the assembly, itscontig-100_0 length_373393 read_count_4619363
there. Or are the actual contig ids in the assembly for 44contig-<contig_id> length_<length> read_count_<read_count>
? Did you try any other binners on those samples and if so, did it work out fine with the same input?
Actually the mapping results are copied directly from the metawrap_binning step. Mapping command line (the two command lines in the MetaWRAP binning step):
([main] CMD: bwa index /home/projects/env_10000/people/binson/MTB/binning/metawrap_binning_44/work_files/assembly.fa)
(bwa mem -v 1 -t 40 /home/projects/env_10000/people/binson/MTB/binning/metawrap_binning_44/work_files/assembly.fa /home/projects/env_10000/people/binson/MTB/F22FTSEUET0054_METfdwM/Clean/44/44_1.fastq /home/projects/env_10000/people/binson/MTB/F22FTSEUET0054_METfdwM/Clean/44/44_2.fastq)
Or are the actual contig ids in the assembly for 44
contig-<contig_id> length_<length> read_count_<read_count>
?Thanks. How did you perform the mapping? The contig names in the depth file seem to differ from the ones in the assembly e.g. instead of
contig-100_0
in the assembly, itscontig-100_0 length_373393 read_count_4619363
there. Or are the actual contig ids in the assembly for 44contig-<contig_id> length_<length> read_count_<read_count>
? Did you try any other binners on those samples and if so, did it work out fine with the same input?
The actual contig ids in the assembly for 44 is like,
contig-100_0 length_373393 read_count_4352300
Did you try any other binners on those samples and if so, did it work out fine with the same input?
Thanks. How did you perform the mapping? The contig names in the depth file seem to differ from the ones in the assembly e.g. instead of
contig-100_0
in the assembly, itscontig-100_0 length_373393 read_count_4619363
there. Or are the actual contig ids in the assembly for 44contig-<contig_id> length_<length> read_count_<read_count>
? Did you try any other binners on those samples and if so, did it work out fine with the same input?
I tried the samples with metawrap(metabat2 and maxbin2), semibin(exact same contig .fa file and bam file from the metawrap binning step), metabinner, vamb.
The error file I ran semibin. semibin_37_52.err-44.txt
The log file I ran semibin. semibin_37_52.log-44.txt
OK, thanks. The problem is, that the average read depth is reported as zero for all contigs. Could you maybe share a (subsampled, anonymized) version of the bam and assembly fasta file so i can check, if i can reproduce it?
Also, one thin i noticed was, that it seems like the assemblies are filtered to contain contigs >1000bp? Id recommend giving binny the unfiltered output to recover more of the assembly and potentially increase performance by quite a bit.
Also, one thin i noticed was, that it seems like the assemblies are filtered to contain contigs >1000bp?
I always choose contigs > 1000bp for all binners. I will try to use all contigs for the input of the binny.
Dear binny support team,
I used binny, but error happened as shown below. Could you help me check what is the reason?
Error screenshot
Error file binny_25.err-25.txt
Log file binny_25.log-25.txt
Job script binny_25.sh.txt
Best,
Bing