Open mydjc opened 2 years ago
Hello, thank you for pointing this out and for giving such a detailed bug report. The 'decode' error is an old py2.7/py3 compatibility issue, and yes the fix that you propose in the code snippet would be my suggested way to handle it.
The divide by zero error is a different problem that's coming from some other part of the code, so if you could post the traceback for it that would be helpful. In general, the popgen module is not the most complete of the analysis tools, so I would not be surprised if there still were some bugs in it.
Encountered an Error.
Message: float division by zero
Traceback (most recent call last):
File "/home/mydjc/miniconda3/envs/ipyrad_py37/lib/python3.7/site-packages/ipyrad/core/Parallel.py", line 314, in wrap_run
self.tool._run(ipyclient=self.ipyclient, **self.rkwargs)
File "/home/mydjc/miniconda3/envs/ipyrad_py37/lib/python3.7/site-packages/ipyrad/analysis/popgen.py", line 266, in _run
prog.update()
File "/home/mydjc/miniconda3/envs/ipyrad_py37/lib/python3.7/site-packages/ipyrad/analysis/utils.py", line 41, in update
hashes = '#' * int(self.progress / 5.)
File "/home/mydjc/miniconda3/envs/ipyrad_py37/lib/python3.7/site-packages/ipyrad/analysis/utils.py", line 33, in progress
return 100 * (self.finished / float(self.njobs))
ZeroDivisionError: float division by zero
Hm, well it looks like this would only ever happen if the number of loci that is being processed is zero. Can you show me the cell where you create the 'Popgen' instance and also the output from that cell? If you pass in an imap or minmap that is too restrictive it will cause all the loci to be removed, and then the run will crash.
In:
data = ipyrad.load_json("/run/media/mydjc/WinSto/Sinopodophyllum/test2/test2.json")
imap = {
"reference": ["reference"],
"T1": ["T1-1", "T1-2"],
"T2": ["T2-1", "T2-2"],
"Z": ["Z1-1"],
}
popgen = Popgen(data=data, imap=imap)
popgen.run(ipyclient=ipyclient)
out:
Parallel connection | mydjc-imac201: 11 cores
[locus filter] full data: 130690
[locus filter] post filter: 0
Encountered an Error.
Message: float division by zero
Traceback (most recent call last):
File "/home/mydjc/miniconda3/envs/ipyrad_py37/lib/python3.7/site-packages/ipyrad/core/Parallel.py", line 314, in wrap_run
self.tool._run(ipyclient=self.ipyclient, **self.rkwargs)
File "/home/mydjc/miniconda3/envs/ipyrad_py37/lib/python3.7/site-packages/ipyrad/analysis/popgen.py", line 266, in _run
prog.update()
File "/home/mydjc/miniconda3/envs/ipyrad_py37/lib/python3.7/site-packages/ipyrad/analysis/utils.py", line 41, in update
hashes = '#' * int(self.progress / 5.)
File "/home/mydjc/miniconda3/envs/ipyrad_py37/lib/python3.7/site-packages/ipyrad/analysis/utils.py", line 33, in progress
return 100 * (self.finished / float(self.njobs))
ZeroDivisionError: float division by zero
###########################################**outfilis/test2_stats.txt#**#######################################
## The number of loci caught by each filter.
## ipyrad API location: [assembly].stats_dfs.s7_filters
total_filters applied_order retained_loci
total_prefiltered_loci 0 0 305133
filtered_by_rm_duplicates 0 0 305133
filtered_by_max_indels 0 0 305133
filtered_by_max_SNPs 6902 6902 298231
filtered_by_max_shared_het 59537 56271 241960
filtered_by_min_sample 111270 111270 130690
total_filtered_loci 177709 174443 130690
## The number of loci recovered for each Sample.
## ipyrad API location: [assembly].stats_dfs.s7_samples
sample_coverage
reference 130690
T1-1 79881
T1-2 74038
T2-1 88748
T2-2 82434
Z1-1 36703
## The number of loci for which N taxa have data.
## ipyrad API location: [assembly].stats_dfs.s7_loci
locus_coverage sum_coverage
1 0 0
2 58102 58102
3 49021 107123
4 19298 126421
5 4269 130690
6 0 130690
The distribution of SNPs (var and pis) per locus.
## var = Number of loci with n variable sites (pis + autapomorphies)
## pis = Number of loci with n parsimony informative site (minor allele in >1 sample)
## ipyrad API location: [assembly].stats_dfs.s7_snps
## The "reference" sample is included if present unless 'exclude_reference=True'
var sum_var pis sum_pis
0 30836 0 98691 0
1 11479 11479 9903 9903
2 8407 28293 6397 22697
3 7359 50370 4481 36140
4 6545 76550 3313 49392
5 6140 107250 2413 61457
6 5715 141540 1753 71975
7 5422 179494 1231 80592
8 5017 219630 803 87016
9 4647 261453 564 92092
10 4226 303713 435 96442
11 4025 347988 308 99830
12 3729 392736 158 101726
13 3309 435753 103 103065
14 2967 477291 60 103905
15 2692 517671 40 104505
16 2483 557399 15 104745
17 2137 593728 8 104881
18 1886 627676 6 104989
19 1720 660356 4 105065
20 1475 689856 0 105065
21 1323 717639 1 105086
22 1174 743467 1 105108
23 956 765455 2 105154
24 876 786479 0 105154
25 742 805029 0 105154
26 656 822085 0 105154
27 567 837394 0 105154
28 463 850358 0 105154
29 369 861059 0 105154
30 313 870449 0 105154
31 210 876959 0 105154
32 190 883039 0 105154
33 153 888088 0 105154
34 123 892270 0 105154
35 106 895980 0 105154
36 72 898572 0 105154
37 52 900496 0 105154
38 51 902434 0 105154
39 16 903058 0 105154
40 19 903818 0 105154
41 18 904556 0 105154
42 8 904892 0 105154
43 6 905150 0 105154
44 5 905370 0 105154
45 2 905460 0 105154
46 2 905552 0 105154
47 0 905552 0 105154
48 0 905552 0 105154
49 2 905650 0 105154
## Final Sample stats summary
state reads_raw reads_passed_filter refseq_mapped_reads refseq_unmapped_reads clusters_total clusters_hidepth hetero_est error_est reads_consens loci_in_assembly
T1-1 7 13682541 13645443 6079794 7565649 361040 221379 0.055388 0.026210 140761 79881
T1-2 7 13961420 13932379 6418573 7513806 347608 207330 0.053587 0.026096 129944 74038
T2-1 7 34419819 34362364 16972741 17389623 416323 329392 0.056436 0.026109 178700 88748
T2-2 7 14733657 14713674 7198375 7515299 365665 232474 0.055524 0.025562 140738 82434
Z1-1 7 22123712 22083562 6085916 15997646 285921 143015 0.070033 0.029408 72794 36703
## Alignment matrix statistics:
snps matrix size: (6, 905650), 48.82% missing sites.
sequence matrix size: (6, 27793782), 44.03% missing sites.
Yes, well you can see here this is exactly what's happening:
[locus filter] full data: 130690
[locus filter] post filter: 0
You have 0 loci that are shared among all samples (as from your stats file):
6 0 130690
If you pass in a population map file and no 'minmap' then it defaults to 4 samples per population (a somewhat permissive lower bound for calculating popgen sumstats).
More importantly than this, the popgen analysis tool will calculate population summary statistics, and if your 'populations' you are assigning have only one or 2 individuals, you're not really going to get meaningful results.
I run popgen following the cookbook-popgen-sumstats.ipynb . Then the error occured.
And as you said in https://github.com/eaton-lab/tetrad/issues/5#issuecomment-872811206 , I modified the similar lines, which contain ".decode", in the popgen.py and locus_extracter.py , because these line has the same AttributeError as above.
But, it is not useful, because I got a new error "ZeroDivisionError: float division by zero" Traceback in the utils.py lines 201.
So, is the script for this function not complete now?