billzt / MiFish

This is the command line version of MiFish pipeline. It can also be used with any other eDNA meta-barcoding primers
https://mitofish.aori.u-tokyo.ac.jp/mifish/
GNU General Public License v3.0
13 stars 3 forks source link

core dumped with usearch -otutab #3

Closed sr-c closed 3 months ago

sr-c commented 1 year ago

Hi there. While testing the pipeline with 2 groups of real data, I got a core dumped with usearch.

mifish seq/AA1 ../../MitoFish_db/MitoFish -d seq/AB2 -s -o MiFish_re_Result
#########
Sample AA1_3 Step 0: Decompress
Sample AA1_3 Step 1: filter the quality of FASTQ and merge Pair-End Reads
Sample AA1_3 Step 2: filter read length and remove primers
Sample AA1_3 Step 3: De-noise and generate haploid
sh: line 1: 165256 Aborted                 (core dumped) usearch -otutab MiFish_re_Result/MiFishResult/Sample-AA1_3/02_process_fasta/AA1_3.processed.fa -zotus MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.fasta -threads 2 -otutabout MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.size.txt > MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.otutab.log 2>&1
Traceback (most recent call last):
  File "/jdfsbjcas1/workdir/Env/miniconda/envs/MiFish_re/bin/mifish", line 33, in <module>
    sys.exit(load_entry_point('mifish', 'console_scripts', 'mifish')())
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/cmd/mifish.py", line 71, in main
    pipeline.runMiFish(data_dir=args.seq_dir, data_dir_other_groups=data_dir_other_groups, \
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/core/pipeline.py", line 223, in runMiFish
    sizeFasIntegrator.run(zotusCountFile=f'{workdir_sample}/03_haploid/{sample_name}.zotus.size.txt', \
  File "/jdfsbjcas1/workdir/Tools/test_install/MiFish/mifish/core/sizeFasIntegrator.py", line 5, in run
    with open(zotusCountFile) as handle:
FileNotFoundError: [Errno 2] No such file or directory: 'MiFish_re_Result/MiFishResult/Sample-AA1_3/03_haploid/AA1_3.zotus.size.txt'

In the end of AA1_3.otutab.log , we found

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Was the memory limit of 32-bit version usearch casued the issue?

billzt commented 1 year ago

@sr-c Yes.

64-bit version usearch need purchasing license.

Perhaps your data is too large?

vsearch is a common alternative of usearch, but we need to wait for some bugs fixed. https://github.com/torognes/vsearch/issues/503

cement-head commented 9 months ago

WOW! USEARCH is $900 for a academic license - that's outrageous!

billzt commented 9 months ago

Now it is almost OK to replace usearch with vsearch. FLASH (a PE-read merger) can also be replaced with vsearch. Hope to apply in the next version

billzt commented 8 months ago

I've added a new branch vsearch, which produces identical results for small datasets, but expecting to have significant speedup for large datasets

Plan to release it after enough tests

DasielOb commented 7 months ago

Dear MiFish Development Team, I am reaching out for an update on the MiFish pipeline, particularly regarding the integration of VSEARCH. I came across a discussion from October and November where it was mentioned that a new branch of MiFish using VSEARCH is in development. This update is of great interest to me, as I have encountered compatibility issues with USEARCH in a 64-bit environment. As noted in the discussion, USEARCH's licensing costs are also a significant consideration.

1) Could you please provide an update on the progress of the VSEARCH integration with the MiFish pipeline? Has this version been released or is it still under testing? 2) If it's available, could you provide guidance on accessing and implementing this VSEARCH-based version of MiFish? 2) If the VSEARCH version is not yet released, is there an estimated timeline for its availability? Additionally, would you recommend any temporary solutions or workarounds for using the current version of MiFish on a 64-bit system?

billzt commented 7 months ago

Dear @DasielOb

The VSEARCH version is at https://github.com/billzt/MiFish/tree/vsearch. We have tested several datasets and found that VSEARCH got significant speed-up!

Please note that VSEARCH v2.23.0+ is required.

Please try with version (Of course, a new conda environment is recommended). We will merge it to the main branch, the docker version, and the web version after receiving enough feedbacks

DasielOb commented 7 months ago

Dear @billzt,

I wanted to extend my deepest gratitude for providing access to the VSEARCH version. I have successfully implemented it for processing my dataset of 38 samples, and I am pleased to report that it worked flawlessly, with the entire process taking only 1 and a half hours. I plan to thoroughly examine the output, and should any specific questions or discussions arise regarding the results or parameters, I will be in touch.

Thank you once again for your support and for the continuous improvements you are bringing to the MiFish pipeline.

Warm regards,