bioinfo-ut / PhenotypeSeeker

Identify phenotype-specific k-mers and predict phenotype using sequenced bacterial strains
GNU General Public License v3.0
18 stars 10 forks source link

Math domain error #25

Open Saruuljavkhlan opened 2 months ago

Saruuljavkhlan commented 2 months ago

Hello, PhenotypeSeeker developing team,

I ran the continuous phenotype file for PhenotypeSeeker. However, the following error occurred. Could you please help me solve this problem? Thank you for your efforts and tool!

My command is : phenotypeseeker modeling data_conti.pheno -w --num_threads 16

Error:

######                   PhenotypeSeeker                   ######
######                      modeling                       ######

Generating the k-mer lists for input samples:
        300 of 300 lists generated.
Generating the k-mer feature vector.
Mapping samples to the feature vector space:
        300 of 300 samples mapped.
Deleting the existing distances.mat file...
Deleting the existing reference.msh file...
Deleting the existing mash_distances.mat file...
Estimating the Mash distances between samples...
Calculating the GSC weights from mash distance matrix...
Conducting the k-mer specific Welch t-tests:
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/multiprocess/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/PhenotypeSeeker/modeling.py", line 701, in get_kmers_tested
    test_results = self.conduct_t_test(
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/PhenotypeSeeker/modeling.py", line 733, in conduct_t_test
    t_statistic, pvalue, mean_x, mean_y = self.t_test(
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/PhenotypeSeeker/modeling.py", line 774, in t_test
    sxy = math.sqrt((varx/sumofweightsx)+(vary/sumofweightsy))
ValueError: math domain error
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/bin/phenotypeseeker", line 337, in <module>
    Main()
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/bin/phenotypeseeker", line 326, in Main
    func = args.func(args)
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/PhenotypeSeeker/modeling.py", line 1704, in modeling
    list(map(
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/PhenotypeSeeker/modeling.py", line 1705, in <lambda>
    lambda x:  x.test_kmers_association_with_phenotype(),
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/PhenotypeSeeker/modeling.py", line 57, in wrapper
    f(*args)
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/PhenotypeSeeker/modeling.py", line 663, in test_kmers_association_with_phenotype
    results_from_threads = p.map(
  File "/home/phenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/multiprocess/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/homephenotype_seeker/PhenotypeSeeker/.PSenv/lib/python3.8/site-packages/multiprocess/pool.py", line 768, in get
    raise self._value
ValueError: math domain error
erkiaun commented 1 month ago

We had our own written function for weighted t-test, because back then, there was no weighted t-test functions available in Python. However, it seems that this function didn't work properly in all cases. I changed PhenotypeSeeker to use "statsmodels.stats.weightstats.ttest_ind" instead of our own function and this seems to solve the error. So, this error is fixed in most recent version of PhenotypeSeeker, v.1.2.0.