gbouras13 commented 2 years ago

Hi Hani,

I've just come across your algorithm and paper - it looks great!

I am planning on using it to cluster ONT minION sequenced full length 16S microbiome samples (after which I plan on creating some "consensus" representative read samples to use for taxonomic classification - which hopefully will be an improvement on Kraken2 and other clustering methods e.g. UMAP based methods from https://github.com/genomicsITER/NanoCLUST ).

Obviously these reads differ from the microbiome reads described in your paper - for my test samples, the mean Q Score of the reads is around 11-12 (so low 90s % accurate). And they are around 1500bp long.

Do you have any advice on tweaking the parameters to optimise cluster quality given the low accuracy of the reads? As an aside, I've found -t 0.8 is on average is giving me the highest cluster score for my test set but I haven't done rigorous tests yet.

And furthermore, would you be able to provide some pre-compiled binaries (please excuse my ignorance if it is not possible, I am a bioinformatician/computational biologist not a computer scientist and I don't code in C++)? I am unable to compile the program on my M1 chip Mac (it runs fine on my Intel chip MacBook once I installed gcc using brew) because g++ doesn't seem to be compatible (yet).

Kind regards,

George Bouras

hani-girgis commented 2 years ago

Hi, George.

Thanks for your interest in MeShClust v3.0.

The length of your sequences is fine. Regarding the threshold identity score, please run MeShClust v3.0 without the -t parameter. It will estimate the threshold identity score. The threshold will be printed. What is it? How many sequences do you have?

To compile the program on the M1 chip Mac, try deleting these lines from CMakeLists.txt:

cmake-cxx-compiler-version-is-pointing-to-the-old-gcc-version

if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU") if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.5) message(FATAL_ERROR "GNU g++ 7.5.0 or later is required. Your current version is: " ${CMAKE_CXX_COMPILER_VERSION}) endif()
else() message(FATAL_ERROR "Your compiler is currently unsupported: " ${CMAKE_CXX_COMPILER_ID}) endif()

Now, you will use the g++ compiler provided by Apple. Please let me if this worked. Otherwise, I can try it on my end sometime next week.

Best regards,

Hani Z. Girgis, PhD

hani-girgis commented 2 years ago

Hi, George.

Were you able to cluster your Nanopore 16S reads using MeShClust v3.0?

Best regards,

Hani Z. Girgis, PhD

BioinformaticsToolsmith / Identity

Nanopore 16S Reads & Pre-Compiled Binaries #2

Credit: https://stackoverflow.com/questions/52180281/cmake-cxx-compiler-version-is-pointing-to-the-old-gcc-version