Hydro3639 / NanoPhase

Reference-quality genome reconstruction from complex metagenomes (or bacterial isolates) using only Nanopore long reads or both long and short reads (hybrid strategy)
MIT License
26 stars 1 forks source link

ERROR: Something wrong with maxbin binning, terminating... #12

Open comingkms opened 4 months ago

comingkms commented 4 months ago

Hi,

No issue for installation, but I got the error when running your test dataset. Thanks,

nanophase meta -l '/home/Downloads/lr.fa.gz' -t 12 -o nanophase-out [2024-06-20 14:48:58] INFO: nanophase (meta) starts [2024-06-20 14:48:58] INFO: Command line: /home/comingkms/anaconda3/envs/nanophase/bin/nanophase meta -l /home/comingkms/Downloads/lr.fa.gz -t 12 -o nanophase-out [2024-06-20 14:48:58] INFO: long_read_only model was selected, only Nanopore long reads will be used [2024-06-20 14:48:58] CHECK: Nanopore long-read (fa.gz) file has been found [2024-06-20 14:48:58] CHECK: Check software availability and locations [2024-06-20 14:48:59] INFO: The following packages have been found

package location

nanophase /home/comingkms/anaconda3/envs/nanophase/bin/nanophase flye /home/comingkms/anaconda3/envs/nanophase/bin/flye metabat2 /home/comingkms/anaconda3/envs/nanophase/bin/metabat2 maxbin2 /home/comingkms/anaconda3/envs/nanophase/bin/run_MaxBin.pl SemiBin /home/comingkms/anaconda3/envs/nanophase/bin/SemiBin metawrap /home/comingkms/anaconda3/envs/nanophase/bin/metawrap checkm /home/comingkms/anaconda3/envs/nanophase/bin/checkm racon /home/comingkms/anaconda3/envs/nanophase/bin/racon medaka /home/comingkms/anaconda3/envs/nanophase/bin/medaka polypolish /home/comingkms/anaconda3/envs/nanophase/bin/polypolish POLCA /home/comingkms/anaconda3/envs/nanophase/bin/polca.sh bwa /home/comingkms/anaconda3/envs/nanophase/bin/bwa seqtk /home/comingkms/anaconda3/envs/nanophase/bin/seqtk minimap2 /home/comingkms/anaconda3/envs/nanophase/bin/minimap2 BBMap /home/comingkms/anaconda3/envs/nanophase/bin/BBMap parallel /home/comingkms/anaconda3/envs/nanophase/bin/parallel perl /home/comingkms/anaconda3/envs/nanophase/bin/perl samtools /home/comingkms/anaconda3/envs/nanophase/bin/samtools gtdbtk /home/comingkms/anaconda3/envs/nanophase/bin/gtdbtk fastANI /home/comingkms/anaconda3/envs/nanophase/bin/fastANI All required packages have been found in the environment. If the above certain packages integrated into nanophase were used in your investigation, please give them credit as well :) [2024-06-20 14:48:59] TASK: Long-read assembly starts (be patient) [2024-06-20 14:55:31] DONE: long-read assembly finished successfully: detailed log file is nanophase-out/01-LongAssemblies/flye.log [2024-06-20 14:55:31] TASK: Initial binning::metabat2 binning starts [2024-06-20 14:55:32] DONE: Initial binning::metabat2 binning finished successfully MetaBAT 2 (v2.12.1) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200. 1 bins (2028309 bases in total) formed. [2024-06-20 14:55:32] TASK: Initial binning::maxbin2 binning starts Can't load '/home/comingkms/perl5/lib/perl5/x86_64-linux-thread-multi/auto/Encode/Encode.so' for module Encode: /home/comingkms/perl5/lib/perl5/x86_64-linux-thread-multi/auto/Encode/Encode.so: undefined symbol: Perl__is_utf8_char_helper at /home/comingkms/anaconda3/envs/nanophase/lib/perl5/core_perl/XSLoader.pm line 93. at /home/comingkms/perl5/lib/perl5/x86_64-linux-thread-multi/Encode.pm line 12. BEGIN failed--compilation aborted at /home/comingkms/perl5/lib/perl5/x86_64-linux-thread-multi/Encode.pm line 13. Compilation failed in require at /home/comingkms/anaconda3/envs/nanophase/lib/perl5/site_perl/LWP/UserAgent.pm line 1073. Compilation failed in require at /home/comingkms/anaconda3/envs/nanophase/bin/run_MaxBin.pl line 4. BEGIN failed--compilation aborted at /home/comingkms/anaconda3/envs/nanophase/bin/run_MaxBin.pl line 4. mv: cannot stat 'nanophase-out/02-LongBins/INITIAL_BINNING/maxbin2/bin*fasta': No such file or directory [2024-06-20 14:55:32] ERROR: Something wrong with maxbin binning, terminating...

Hydro3639 commented 4 months ago

Hi,

May I know the version of nanophase you installed? And for the long-read dataset (lr.fa.gz), are you using SRR17913199 (if not, I would suggest to use this one)?

comingkms commented 4 months ago

nanophase v=0.2.3 I used the long-read dataset from https://github.com/example-data/np-example mentioned in your README. Will try SRR17913199.

Thanks

Hydro3639 commented 4 months ago

besides the dataset, i also noticed that there was a mismatch between the version of Perl and the version of the Encode module in your provided log file. So maybe consider to reinstall the Encode module in the nanophase env to ensure that it is properly compiled against your current Perl version (the command you may refer to: cpan -f -i Encode)

comingkms commented 4 months ago

Yes, I've realized that, but I still had the issue with SemiBin which has been mentioned before due to the simple dataset. I'll try your SRR17913199. Should I remove the host genome first to improve bacterial assemblies?

Thanks,

Hydro3639 commented 4 months ago

Glad you have resolved it! Yes, due to the limitations of the simple dataset, nanophase will exit during the semibin stage. That's why I provided the whole mock dataset: SRR17913199. Just give it a try.

I would suggest removing host genomes before the bacterial assemblies, it will make the assembly process easier and faster and lower the potential contamination.

Best

comingkms commented 4 months ago

Using SRR17913199, " I got the following error: " ERROR: Something wrong with medaka polishing, please also check nanophase-out/03-Polishing/medaka/medaka.polish.log, terminating..." Please check the attached medaka.log, indicating out of memory issue. I am just wondering how big GPU is needed ? Mine is RTX 3090 (24G) medaka.polish.log

Hydro3639 commented 4 months ago

I don't have much experience running Medaka polish with a GPU. If it is a GPU memory issue, you might consider lowering the number of threads to 2 (-t 2). This way, nanophase will only run Medaka polish once at a time, reducing the GPU memory requirement. Alternatively, you can use the CPU for polishing by setting export CUDA_VISIBLE_DEVICES=""

comingkms commented 4 months ago

Finally, I could complete the run using SRR17913199 with some modifications:

  1. Medaka polish with GPU. As you suggested, I have to lower the number of threads to 2.
  2. pplacer uses mass amounts of memory( around 90G). Please add --scratch_dir nanophase-out/03-Polishing/Final-bins/tmp_1 Thanks,
Hydro3639 commented 4 months ago

Hi, thank you for the suggestion regarding the memory usage of pplacer. I agree that pplacer requires substantial memory, and using the --scratch_dir flag can reduce the memory load during this step.

The main reason we didn't include this flag in the default settings is that, for many environmental samples with higher complexity than the mock dataset, the most memory-intensive step is actually the long-read assembly with Flye. Although Flye is quite memory-efficient, it remains the primary memory-consuming process. I may consider adding an optional flag for --scratch_dir in the future to give users more flexibility in managing memory usage but it is not a high-priority task at the moment.