BioinformaticsToolsmith / Identity

Other
32 stars 3 forks source link

Negative threshold #25

Open JaspervB-tud opened 1 month ago

JaspervB-tud commented 1 month ago

I ran meshclust on a very small set of E coli genomes (4 genomes) using the following command: meshclust -d PATH_TO_FILE/file.fa -o PATH_TO_OUTPUTDIR/mesh_clust_threshold_AUTO.txt -c 16 -a y

The program ran for some time training Identity but found a threshold of -0.007409239 after which it crashed due to the negative threshold. For now my workaround is to guesstimate an appropriate threshold which I can use for further experiments, but this might be something worth looking into.

Output generated by meshclust:

Cores: 16
Estimating the threshold ...
Average: 4544141
K: 11
Histogram size: 4194304
A histogram entry is 32 bits.
Generating data.
Number of standard deviations: 2
Preparing data ...
    Positive examples: 10000
    Training size: 5000
    Validation size: 5000
Better performance of: 5.69984e-05
    sim_ratio
Better performance of: 4.55067e-06
    sim_ratio
    correlation^2
Better performance of: 1.10609e-06
    sim_ratio
    simMM
    correlation^2
    minkowski x sim_ratio
    minkowski x sim_ratio^2
Better performance of: 8.40882e-07
    minkowski
    sim_ratio
    simMM
    d2_star
    correlation^2
    chebyshev x d2_star
    minkowski x sim_ratio
    minkowski x sim_ratio^2
Better performance of: 6.94492e-07
    minkowski
    jeffrey_divergence
    sim_ratio
    simMM
    d2_star
    correlation^2
    chebyshev x jeffrey_divergence
    chebyshev x d2_star
    minkowski x sim_ratio
    minkowski x sim_ratio^2
    chebyshev^2 x minkowski^2
Selected statistics:
    minkowski
    jeffrey_divergence
    sim_ratio
    simMM
    d2_star
    correlation^2
    chebyshev x jeffrey_divergence
    chebyshev x d2_star
    minkowski x sim_ratio
    minkowski x sim_ratio^2
    chebyshev^2 x minkowski^2
Finished training.
    MAE: 0.000665851
    MSE: 6.94492e-07
Optimizing ...
Validating ...
    MAE: 0.000666787
    MSE: 6.86473e-07
Mean = 0.707903
STD = 0.411681
Min = -0.00740926
============================================
-0.00740926
Final threshold: -0.00740926