Open nvucic opened 1 year ago
This is the right repository. Please follow the posted instructions. Once compilation is done, you will find identity and meshclust v3.0.
@hani-girgis Thanks for this tool. I think the confusion come from the version displayed by the program when it run. It shows MeShClust v2.0.
MeShClust 2.0 is developed by Hani Z. Girgis, PhD.
This program clusters DNA sequences using identity scores obtained without alignment.
Copyright (C) 2021-2022 Hani Z. Girgis, PhD
Academic use: Affero General Public License version 1.
Any restrictions to use for profit or non-academics: Alternative commercial license is required.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Please contact Dr. Hani Z. Girgis (hzgirgis@buffalo.edu) if you need more information.
Please cite the following papers:
MeShClust v3.0: High-quality clustering of DNA sequences using the mean shift algorithm
and alignment-free identity scores (2022). Hani Z. Girgis, BMC Genomics, 23(1):423.
Identity: Rapid alignment-free prediction of sequence alignment identity scores using
self-supervised general linear models (2021). Hani Z. Girgis, Benjamin T. James, and
Brian B. Luczak. NAR Genom Bioinform, 13(1), lqab001.
A survey and evaluations of histogram-based statistics in alignment-free sequence
comparison (2019). Brian B. Luczak, Benjamin T. James, and Hani Z. Girgis. Briefings
in Bioinformatics, 20(4):1222–1237.
MeShClust: An intelligent tool for clustering DNA sequences (2018). Benjamin T. James,
Brian B. Luczak, and Hani Z. Girgis. Nucleic Acids Res, 46(14):e83.
Database file: mono.fasta
Output file: test.txt
Cores: 16
Provided threshold: 0.8
Block size for all vs. all: 25000
Block size for reading sequences: 100000
Number of data passes: 10
Can assign all: No
Average: 2273
K: 5
Histogram size: 1024
A histogram entry is 16 bits.
Generating data.
Preparing data ...
Positive examples: 10000
Training size: 5000
Validation size: 5000
Better performance of: 0.00155154
jeffrey_divergence x simMM
Better performance of: 0.0012286
correlation x d2_s_r^2
jeffrey_divergence x simMM
Better performance of: 0.00110226
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
Better performance of: 0.0010716
minkowski x sim_ratio^2
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
Better performance of: 0.00103351
jeffrey_divergence
minkowski x sim_ratio^2
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
Better performance of: 0.000955872
minkowski
jeffrey_divergence
minkowski x sim_ratio^2
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
Better performance of: 0.000905404
minkowski
jeffrey_divergence
chi_squared x sim_ratio
minkowski x sim_ratio^2
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
Better performance of: 0.000880007
minkowski
jeffrey_divergence
chi_squared x sim_ratio
minkowski x sim_ratio^2
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
squared_chord^2 x simMM^2
Better performance of: 0.000835517
minkowski
jeffrey_divergence
chi_squared x sim_ratio
minkowski x sim_ratio^2
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
squared_chord^2 x sim_ratio^2
squared_chord^2 x simMM^2
Better performance of: 0.000806042
minkowski
jeffrey_divergence
chi_squared x sim_ratio
minkowski x sim_ratio^2
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
sim_ratio x d2_s_r^2
chi_squared^2 x d2_s_r^2
squared_chord^2 x sim_ratio^2
squared_chord^2 x simMM^2
Selected statistics:
minkowski
jeffrey_divergence
chi_squared x sim_ratio
minkowski x sim_ratio^2
minkowski x simMM^2
correlation x d2_s_r^2
jeffrey_divergence x simMM
sim_ratio x d2_s_r^2
chi_squared^2 x d2_s_r^2
squared_chord^2 x sim_ratio^2
squared_chord^2 x simMM^2
Finished training.
MAE: 0.0177417
MSE: 0.000806042
Optimizing ...
Validating ...
MAE: 0.0231115
MSE: 0.00118335
Clustering ...
Data run 1 ...
Processed sequences: 13486
Unprocessed sequences: 0
Found centers: 149
Assigning ...
Finished.
Thanks for using MeShClust v2.0. Please post any questions or problems on GitHub:
https://github.com/BioinformaticsToolsmith/Identity or email Dr. Hani Z. Girgis.
Hi @hani-girgis Thank you for this amazing tool. I am trying to get MeShClust 3.0 from the last release (V.2.0). Nevertheless, after sucessfully compile identity, MeShClust is not appearing. Actually, I was looking for the MeShClust source code and is not even there.
I also tried to get MeShClust 3.0 from the master branch, but it is generating the identity v.1.2 and MeShClust v.2.0.
So, how can I get the MeShClust version 3.0?
Thank you for the help.
Best,
Simon.
Sorry there must be some resource I'm missing but could not find the latest v3.0.0