a-h-b / binny

GNU General Public License v3.0
28 stars 6 forks source link

ModuleNotFoundError: No module named 'conda._vendor.auxlib' #28

Closed B-1991-ing closed 2 years ago

B-1991-ing commented 2 years ago

Dear binny support team,

I used binny, but error happened as shown below. Could you help me check what is the reason?

Error screenshot

Screenshot 2022-06-20 at 18 40 10

Error file binny_25.err-25.txt

Log file binny_25.log-25.txt

Job script binny_25.sh.txt

Best,

Bing

B-1991-ing commented 2 years ago

Can we write info of multiple samples in to one config.yaml file? Or we can only write one sample's information into one config file?

Screenshot 2022-06-28 at 11 55 14
ohickl commented 2 years ago

I tried to install the miniconda three times, always got no space error, although I already specified my PROJECT directory.

Would that not mean that your project folder is completely full? Maybe an admin or colleague can help you figuring out whats wrong?

Is any way we just run the binny script and directly load the necessary modules - snakemake, prokka, and mantis, etc, instead of creating different conda environments?

Only by basically rewriting everything.

I would then instead try loading the latest conda module again, install binny to your project dir, and have binny load that same module by specifying it in the VARIABLE_CONFIG. But, if the project dir is full, it would of course still not work.

Can we write info of multiple samples in to one config.yaml file?

Only one atm.

B-1991-ing commented 2 years ago

Would that not mean that your project folder is completely full? Maybe an admin or colleague can help you figuring out whats wrong?

I already reported the problem to the HPC admin and asked how many TB storage space each HPC user can have, but he only suggested always keep everything in PROJECT dir. It seems like every time, the Miniconda tried to install something to my PERSONAL directory, and lead to the no space error.

B-1991-ing commented 2 years ago

I would then instead try loading the latest conda module again, install binny to your project dir, and have binny load that same module by specifying it in the VARIABLE_CONFIG. But, if the project dir is full, it would of course still not work.

1st step: git clone https://github.com/a-h-b/binny.git cd binny

2nd step: set latest miniconda in VARIABLE_CONFIG file as shown below

Screenshot 2022-06-28 at 12 31 20

load latest miniconda in binny dir

Screenshot 2022-06-28 at 12 32 16

3rd step: ./binny -i config/config.init.yaml

Then, some error screenshots.

Screenshot 2022-06-28 at 12 38 02 Screenshot 2022-06-28 at 12 38 53

.... But, I didn't use modules - mamba>=0.22.1, conda-build>=3.21.8 and conda>=4.12.0

ohickl commented 2 years ago

And you are sure you set the install destination during the install to your project dir (it will ask you at some point where to put it)? It might still try to modify e.g. your .bashrc etc. to allow functionality. But those changes should take virtually no space and would only fail if your personal dir is 100% full, which should never be the case. If so, you should definitely delete some unused stuff there, as it might lead to lots of problems in general.

Then, some error screenshots.

This looks again like this. You can also try to put

pkgs_dirs:
  - /home/<user>/.conda/pkgs

in your .condarc (or create that file, if you dont have it), which seemed to fix it for that guy. You can also try to run conda clean -a and see if that helps with your space problems.

B-1991-ing commented 2 years ago

And you are sure you set the install destination during the install to your project dir (it will ask you at some point where to put it)?

During the four binny installation steps, I didn't receive any requirement to ask me the final destination.

ohickl commented 2 years ago

I meant for miniconda

B-1991-ing commented 2 years ago

I meant for miniconda

You mean due to my PERSONAL dir is full, so even if I loaded the latest miniconda to install the binny, it still can't the FULL folder? I cleaned 1 GB conda packages use "conda clean -a", which I really don't know, where they are from... Thank you very much.

B-1991-ing commented 2 years ago

It takes 30 mins to finish the command line: "./binny -i config/config.init.yaml". But, with one error left as shown below.

Screenshot 2022-06-28 at 13 38 01

When I ran the command line "conda clean -a", I may also removed the "/home/people/binson/.conda/pkgs/infernal-1.1.4-pl5321hec16e2b_1/info/index.json"

Screenshot 2022-06-28 at 13 40 13
B-1991-ing commented 2 years ago

But, already some .yaml files generated here.

Screenshot 2022-06-28 at 13 41 43

Does the error "/home/people/binson/.conda/pkgs/infernal-1.1.4-pl5321hec16e2b_1/info/index.json" not existed matter?

B-1991-ing commented 2 years ago

During the process of the binny installation, I received two automatic emails from the HPC to remind me already exceed the 10GB limitation.

Screenshot 2022-06-28 at 13 47 20
B-1991-ing commented 2 years ago

It already finished creating conda env for ../workflow/envs/mapping.yaml, ../workflow/envs/binny_linux.yaml, ../workflow/envs/mantis.yaml. But error happened when building ../workflow/envs/prokka.yaml due to no space error.

Screenshot 2022-06-28 at 14 00 17
ohickl commented 2 years ago

Ok, its probably because we set

pkgs_dirs:
  - /home/<user>/.conda/pkgs

in you .condarc and you have very little space there. If you clean again and change the path to somewhere appropriate in you project folder you should have the space and could try again. (You could also add envs_dirs: and a path to /path/to/project/dir/<user>/.conda/envs, in the same format as pkgs_dirs: to .condarc, t ensure all environments that you create without specifying the path, will not be put in your hom.e) Also, just to avoid unintended interactions, I'd only load miniconda in binny's VARIABLE_CONFIG, unless you really need the other modules.

B-1991-ing commented 2 years ago

In my "/home/people/binson/.condarc" file, only one line - "channel_priority: strict".

Screenshot 2022-06-28 at 14 35 05

Do you mean I can add the /home/projects/env_10000/people/binson/.conda/envs in the .condarc file as shown below?

Screenshot 2022-06-28 at 14 48 22
ohickl commented 2 years ago

Yes, e.g. like this:

channel_priority: strict
pkgs_dirs:
  - /home/projects/env_10000/people/binson/.conda/pkgs
envs_dirs:
  - /home/projects/env_10000/people/binson/.conda/envs
B-1991-ing commented 2 years ago

All conda .yaml files are created now, ../workflow/envs/mapping.yaml ../workflow/envs/fasta_processing.yaml ../workflow/envs/prokka.yaml ../workflow/envs/binny_linux.yaml ../workflow/envs/mantis.yaml ../workflow/envs/mantis.yaml

I ran the test - ./binny -l -n "TESTRUN" -r config/config.test.yaml, and got an error from the analysis_checkm_markers.log. analysis_checkm_markers.log

B-1991-ing commented 2 years ago

I really didn't install the three databases,

/home/projects/env_10000/people/binson/binny/conda/8875a4cc85bac50feeefc3e8545f0147/lib/python3.9/site-packages/Resources/NCBI/ /data/isbio/TOOLS/binny/database/hmms/checkm_tf/ /data/isbio/TOOLS/binny/database/hmms/checkm_pf/

Screenshot 2022-06-28 at 16 27 38

I have databases /home/projects/env_10000/people/binson/binny/database/hmms/checkm_pf /home/projects/env_10000/people/binson/binny/database/hmms/checkm_tf

ohickl commented 2 years ago

All conda .yaml files are created now

These are the supplied environment files. The actual environments are in path/to/binny/conda (if the default install location was chosen) and are named with a combination of letters and numbers (MD5 hash).

I think, something went wrong during the initial, failed install attempt and Mantis was not installed properly. You can delete the mantis conda env (found like this:

head -n 1 `path/to/binny/conda/*.yaml`

then remove the one with name: mantis and its yaml file rm -rf path/to/binny/conda/<mantis_environment_hash>* and run ./binny -i config/config.init.yaml again.

Im unsure, whats up with /data/isbio/TOOLS/binny/database/..., since you seem to have them at the correct default location.

B-1991-ing commented 2 years ago

I did according to your suggestion as shown below.

Screenshot 2022-06-28 at 18 20 42

Although the ../workflow/envs/mantis.yaml created again, but Total download: 0 B.

Screenshot 2022-06-28 at 18 29 18
ohickl commented 2 years ago

You need to delete also the environment folder rm -rf 8875a4cc<...>*, which will remove the folder and the yaml file.

B-1991-ing commented 2 years ago

Yeah, I removed both.

B-1991-ing commented 2 years ago
Screenshot 2022-06-28 at 18 36 10

Is it due to bash: hmmpress: command not found...?

ohickl commented 2 years ago

Hm, that should not happen. Might be because of the module loading and the fact that you dont have conda. Could you add this line . "$(conda info --base)/etc/profile.d/conda.sh" after line 90 (eval $LOADING_MODULES) in binny and try again?

B-1991-ing commented 2 years ago

Finally, it works! Thank you very much for your help and patience!

It takes around 20 minutes to run the test, on an interactive node - 1 node, 1ppn, with 180GB.

Screenshot 2022-06-28 at 19 30 46

One more question, if I have 12 samples, I need to set 12 separate config.yaml files for my 12 samples, then run 12 times separately?

ohickl commented 2 years ago

Finally!

It takes around 20 minutes to run the test, on an interactive node - 1 node, 1ppn, with 180GB.

That is very slow, even for 1 core. You can speed it up with -t using more cores, but maybe the storage is a bit slow too. Either way, glad it works now!

if I have 12 samples, I need to set 12 separate config.yaml files for my 12 samples, then run 12 times separately

Yes, atm it runs per sample. You can create a small wrapper that builds a config file for each sample.

B-1991-ing commented 2 years ago

Yes, atm it runs per sample. You can create a small wrapper that builds a config file for each sample.

what is a wrapper?

ohickl commented 2 years ago

Its basically a program that calls another program and makes it easier to use in a certain scenario. E.g. to make some binny config files and run binny with them my_binny_wrapper.sh:

#! /bin/bash -l

threads=24
path_to_binny="absolute/path/to/binny/binny"
path_to_my_prepared_config_file_template="absolute/path/to/config"
path_to_sample_data="absolute/path/to/data"
path_to_output="absolute/path/to/output"

# Write config file per sample to desired sample output dir and run binny with it
for sample in ${path_to_data}/samples/*; do
  path_to_sample_assembly=${sample}/assembly.fasta
  path_to_sample_align=${sample}/alignment.bam
  sed -e "s|assembly: \"\"|assembly: \"${path_to_sample_assembly}\"|g" \
      -e "s|metagenomics_alignment: \"\"|metagenomics_alignment: \"${path_to_sample_align}\"|g" \
      -e "s|outputdir: \"output\"|outputdir: \"${path_to_output}/${sample}\"|g" \
      ${path_to_my_prepared_config_file_template} > ${path_to_output}/configs/${sample}_binny.config.yaml
  ${path_to_binny} -t ${threads} -c -n "${sample}" -r ${path_to_output}/configs/${sample}_binny.config.yaml
done
B-1991-ing commented 2 years ago

Thank you very much for the wrapper script, I will have a try.

I just tried to submit my 15 jobs at their output folders, separately. And, all jobs were stoped at some certain points.

I just take the sample 52 as example.

binny_52.err.txt binny_52.sh.txt

ohickl commented 2 years ago

Could you also post /home/projects/env_10000/people/binson/MTB/binning/binny_52/52_output/logs/analysis_checkm_markers.log? But I assume, its because you forgot to include --conda-prefix /aboslute/path/to/binny/conda/environments/dir in your snakemake call. At the beginning of the log you can see that it installs all environments again at .snakemake/conda/<env_hash> (default Snakemake behavior), which doesnt work for Mantis, as it needs setup first. I would also delete .snakemake/conda because it contains lots of files and takes up quite a bit of space.

B-1991-ing commented 2 years ago

/home/projects/env_10000/people/binson/MTB/binning/binny_52/52_output/logs/analysis_checkm_markers.log?

I already deleted analysis_checkm_markers.log before your comment.

At the beginning of the log you can see that it installs all environments again at .snakemake/conda/<env_hash> (default Snakemake behavior), which doesnt work for Mantis, as it needs setup first. I would also delete .snakemake/conda because it contains lots of files and takes up quite a bit of space.

I added the parameter - --conda-prefix /home/projects/env_10000/people/binson/binny/conda according to your suggestion. And, deleted .snakemake/conda before job submission. At least, everything is fine now.

Thank you very much again.

ohickl commented 2 years ago

Ok. You also gave each binny run just one core (snakemake -j 1 ...). Is that intentional? Unless the samples are very small, Id suggest more, to speed things up but also make sure each rule has enough memory. Depending on how you set normal_mem_per_core_gb: in you config, Prokka, Mantis or binny might run out of memory.

B-1991-ing commented 2 years ago

Ok. You also gave each binny run just one core (snakemake -j 1 ...). Is that intentional? Unless the samples are very small, Id suggest more, to speed things up but also make sure each rule has enough memory. Depending on how you set normal_mem_per_core_gb: in you config, Prokka, Mantis or binny might run out of memory.

I didn't add thread number in the snakemake command line as shown below, snakemake -j 1 -s /home/projects/env_10000/people/binson/binny/Snakefile --configfile /home/projects/env_10000/people/binson/MTB/binning/binny_52/config52.yaml --use-conda --conda-prefix /home/projects/env_10000/people/binson/binny/conda binny_52.sh.txt

I set big_mem_per_core_gb: 90 and normal_mem_per_core_gb: 8 in the config52.yaml. config52.yaml.txt

I submitted each shell script using the command line as shown below, qsub -W group_list=env_10000 -A env_10000 -l nodes=1:ppn=10:fatnode,mem=1000gb,walltime=48:00:00 /home/projects/env_10000/people/binson/scripts/MTBscripts/binny_52.sh

My jobs running on the HPC cluster like the screenshot now,

Screenshot 2022-06-29 at 11 47 05

I am not sure how long it will take, it exceed 48 hours, I then email the HPC admin to extend later.

B-1991-ing commented 2 years ago

Unless the samples are very small

I have five data sets, five datasets

Two datasets with assembly.fa around 30M - 72Mand bam file around 10GB.

Other datasets with assembly.fa around 250M - 500M and bam file around 30GB.

B-1991-ing commented 2 years ago

Unless the samples are very small

I have five data sets, five datasets

Two datasets with IDBA-UD assembly.fa around 30M - 72Mand bam file around 10GB.

Other datasets with IDBA-UD assembly.fa around 250M - 500M and bam file around 30GB.

Two datasets with metaSPAdes assembly.fa around 2-4G.

Other datasets with metaSPAdes assembly.fa around 10G.

ohickl commented 2 years ago

Ok, not sure if the memory will be enough (with 8gb per core and 1 core), especially for the large one. You can let it run and if something crashes or runs out of time, Id use e.g. 20 cores (qsub ppn=20, and snakemake -c <all or 20> (-j in local mode = -c) and then normal_mem_per_core_gb: = 1000/20 = 50 (if you give 1000gb). binny will go on after the last job completed. But im not familiar with your system, so if in doubt id ask the admins again.

B-1991-ing commented 2 years ago

It's fast. I firstly submitted 15 jobs and three finished already. No bin output, but I didn't see any error.

I take sample 43 as an example.

binny_43.sh binny_43.sh.txt

binny_43.err binny_43.err.txt

binny_43.log binny_43.log

All log files logs.zip

ohickl commented 2 years ago

hm, how does /home/projects/env_10000/people/binson/MTB/binning/binny_43/43_output look like? can you post /home/projects/env_10000/people/binson/MTB/binning/binny_43/43_output/logs/binning_binny.log

B-1991-ing commented 2 years ago

/home/projects/env_10000/people/binson/MTB/binning/binny_43/43_output

Screenshot 2022-06-29 at 12 48 44

binning_binny.log

ohickl commented 2 years ago

It ran fine but did not find any bins. What kind of data are you working with, if I may ask? The N95 is 1179 and theres only 12998 contigs. Could you also link intermediary/annotation_CDS_RNA_hmms_checkm.gff and intermediary/assembly.contig_depth.txt?

B-1991-ing commented 2 years ago

It ran fine but did not find any bins. What kind of data are you working with, if I may ask?

The 15 sample-dataset I am running now, are isolated bacteria species but not pure. assembly.contig_depth.txt annotation_CDS_RNA_hmms_checkm.gff.txt

B-1991-ing commented 2 years ago

I want to keep MAG with the completeness >=50 and contamination <5.

ohickl commented 2 years ago

Could you delete the output and rerun e.g. 43 passing --until 'call_contig_depth' to snakemake and post intermediary/assembly_contig_depth_<number>.txt? Something is wrong with the depth table, not sure where that comes from.

B-1991-ing commented 2 years ago

I submitted the job at 13:30PM, but it was Idle until now. When it finished, I will post here.

Screenshot 2022-06-29 at 18 07 00
B-1991-ing commented 2 years ago

I tried to use thinnode - 1 node, 5 ppn, 180GB to run the snakemake command line, snakemake -c 36 -s /home/projects/env_10000/people/binson/binny/Snakefile --configfile /home/projects/env_10000/people/binson/MTB/binning/binny_44/config44.yaml --use-conda --conda-prefix /home/projects/env_10000/people/binson/binny/conda --until 'call_contig_depth'

/home/projects/env_10000/people/binson/MTB/binning/binny_44/44_output/intermediary/assembly_contig_depth_44.txt assembly_contig_depth_44.txt

ohickl commented 2 years ago

Thanks. How did you perform the mapping? The contig names in the depth file seem to differ from the ones in the assembly e.g. instead of contig-100_0 in the assembly, its contig-100_0 length_373393 read_count_4619363 there. Or are the actual contig ids in the assembly for 44 contig-<contig_id> length_<length> read_count_<read_count>? Did you try any other binners on those samples and if so, did it work out fine with the same input?

B-1991-ing commented 2 years ago

How did you perform the mapping?

Thanks. How did you perform the mapping? The contig names in the depth file seem to differ from the ones in the assembly e.g. instead of contig-100_0 in the assembly, its contig-100_0 length_373393 read_count_4619363 there. Or are the actual contig ids in the assembly for 44 contig-<contig_id> length_<length> read_count_<read_count>? Did you try any other binners on those samples and if so, did it work out fine with the same input?

Actually the mapping results are copied directly from the metawrap_binning step. Mapping command line (the two command lines in the MetaWRAP binning step):

  1. ([main] CMD: bwa index /home/projects/env_10000/people/binson/MTB/binning/metawrap_binning_44/work_files/assembly.fa)

  2. (bwa mem -v 1 -t 40 /home/projects/env_10000/people/binson/MTB/binning/metawrap_binning_44/work_files/assembly.fa /home/projects/env_10000/people/binson/MTB/F22FTSEUET0054_METfdwM/Clean/44/44_1.fastq /home/projects/env_10000/people/binson/MTB/F22FTSEUET0054_METfdwM/Clean/44/44_2.fastq)

Or are the actual contig ids in the assembly for 44 contig-<contig_id> length_<length> read_count_<read_count>?

Thanks. How did you perform the mapping? The contig names in the depth file seem to differ from the ones in the assembly e.g. instead of contig-100_0 in the assembly, its contig-100_0 length_373393 read_count_4619363 there. Or are the actual contig ids in the assembly for 44 contig-<contig_id> length_<length> read_count_<read_count>? Did you try any other binners on those samples and if so, did it work out fine with the same input?

The actual contig ids in the assembly for 44 is like,

contig-100_0 length_373393 read_count_4352300

Screenshot 2022-06-30 at 10 41 26

Did you try any other binners on those samples and if so, did it work out fine with the same input?

Thanks. How did you perform the mapping? The contig names in the depth file seem to differ from the ones in the assembly e.g. instead of contig-100_0 in the assembly, its contig-100_0 length_373393 read_count_4619363 there. Or are the actual contig ids in the assembly for 44 contig-<contig_id> length_<length> read_count_<read_count>? Did you try any other binners on those samples and if so, did it work out fine with the same input?

I tried the samples with metawrap(metabat2 and maxbin2), semibin(exact same contig .fa file and bam file from the metawrap binning step), metabinner, vamb.

B-1991-ing commented 2 years ago

The error file I ran semibin. semibin_37_52.err-44.txt

The log file I ran semibin. semibin_37_52.log-44.txt

ohickl commented 2 years ago

OK, thanks. The problem is, that the average read depth is reported as zero for all contigs. Could you maybe share a (subsampled, anonymized) version of the bam and assembly fasta file so i can check, if i can reproduce it?

ohickl commented 2 years ago

Also, one thin i noticed was, that it seems like the assemblies are filtered to contain contigs >1000bp? Id recommend giving binny the unfiltered output to recover more of the assembly and potentially increase performance by quite a bit.

B-1991-ing commented 2 years ago

Also, one thin i noticed was, that it seems like the assemblies are filtered to contain contigs >1000bp?

I always choose contigs > 1000bp for all binners. I will try to use all contigs for the input of the binny.