Closed roblanf closed 9 years ago
Update. I couldn't reproduce the error on my machine. However, when I re-ran it with an extra model (GTR+I+G), I did get the bug.
My suspicion is that this is a rare bug, that crops up occasionally when we try and merge subsets. Because we don't have a random number seed, it's hard to reproduce this one.
The best I can think of doing for now is this.
We fall over when we call spatial.distance.pdist
So for now I'll just run a heap of analyses and check what we pass to that function, i.e.:
centroid_array = [sub.centroid, centroid]
print centroid_array
euclid_dist = spatial.distance.pdist(centroid_array)
If I can figure out when it fails, that's a good start.
Here's a centroid array that gives us the error. Note the 'none' in the final line:
[[-0.58612193945471691], [24.799193535274632]]
[[-1.0], [24.799193535274632]]
[[-0.49735338396332335], [24.799193535274632]]
[[1.5793039593201781], [24.799193535274632]]
[[2.8262365442513655], [24.799193535274632]]
[[1.3410727258731898], [24.799193535274632]]
[[-0.48584979311026438], [24.799193535274632]]
[[0.63550764497753109], [24.799193535274632]]
[[-1.289706691277938], [24.799193535274632]]
[[0.49178103329441003], [24.799193535274632]]
[[-1.6716222259469751], [24.799193535274632]]
[[0.89895835912967736], [24.799193535274632]]
[[-1.0591706520859276], [24.799193535274632]]
[[0.60825455014161212], [24.799193535274632]]
[[-1.4305875173721956], [24.799193535274632]]
[[1.2801813603340069], [24.799193535274632]]
[[-0.4435925348804895], [24.799193535274632]]
[[2.1026286153331615], [24.799193535274632]]
[[-0.99097354790359327], [24.799193535274632]]
[[-1.1640002715622064], [24.799193535274632]]
[[0.53513758876528017], [24.799193535274632]]
[[-1.497246657928202], [24.799193535274632]]
[[-1.1499609177607493], [24.799193535274632]]
[[2.3772972659886955], [24.799193535274632]]
[[-0.94962884247268708], [24.799193535274632]]
[[1.5653105589129743], [24.799193535274632]]
[[-0.81035569108688943], [24.799193535274632]]
[[0.84560888743079909], [24.799193535274632]]
[[-1.0679285949243082], [24.799193535274632]]
[[0.94612330396208377], [24.799193535274632]]
[[-0.68423624534251248], [24.799193535274632]]
[[1.0195624394520091], [24.799193535274632]]
[[-0.73580072491196113], [24.799193535274632]]
[[-0.54455821167300011], [24.799193535274632]]
[[1.1782259488924929], [24.799193535274632]]
[[0.47522895807972493], [24.799193535274632]]
[[-1.6439951768570358], [24.799193535274632]]
[[-1.0], [24.799193535274632]]
[[3.7723691800736461], [24.799193535274632]]
[[-2.2604494792305467], [24.799193535274632]]
[[1.0], [24.799193535274632]]
[[-1.0], [24.799193535274632]]
[[0.94706680735452753], [24.799193535274632]]
[[0.0], [24.799193535274632]]
[[-1.0], [24.799193535274632]]
[[1.2376044012104674], [24.799193535274632]]
[[1.4197734869510765], [24.799193535274632]]
[[0.53162310582542915], [24.799193535274632]]
[[-1.4101723561215025], [24.799193535274632]]
[[0.62326136233518692], [24.799193535274632]]
[[-0.5523953801949284], [24.799193535274632]]
[[1.3871653956332695], [24.799193535274632]]
[[1.6310904225101077], [24.799193535274632]]
[[-0.57117606853250924], [24.799193535274632]]
[[-0.76975326820651457], [24.799193535274632]]
[[1.1375999627475424], [24.799193535274632]]
[[2.5961232616147978], [24.799193535274632]]
[[-0.3284632837080077], [24.799193535274632]]
[[-0.91963666180302683], [24.799193535274632]]
[[0.96174888665175717], [24.799193535274632]]
[[-1.0], [24.799193535274632]]
[[1.0], [24.799193535274632]]
[[-0.46231912335421083], [24.799193535274632]]
[[-1.7861645722464548], [24.799193535274632]]
[[-0.72838502555423146], [24.799193535274632]]
[[3.5622677593348477], [24.799193535274632]]
[[1.0], [24.799193535274632]]
[[1.0], [24.799193535274632]]
[[-2.5873091453997241], [24.799193535274632]]
[[1.1613644725429884], [24.799193535274632]]
[[0.48438387235931279], [24.799193535274632]]
[[-1.6007122669354434], [24.799193535274632]]
[[-1.0], [24.799193535274632]]
[[1.0], [24.799193535274632]]
[[5.58271522052046], [24.799193535274632]]
[[-4.5208089242972731], [24.799193535274632]]
[[-2.7619309003637462], [24.799193535274632]]
[[0.0], [24.799193535274632]]
[[-1.0], [24.799193535274632]]
[[-0.58658587817688657], [24.799193535274632]]
[[-0.51256881627450379], [24.799193535274632]]
[[1.6493551018733847], [24.799193535274632]]
[[-2.4552791368742377], [24.799193535274632]]
[[-0.87253937117727964], [24.799193535274632]]
[[1.2911918531743534], [24.799193535274632]]
[[-0.64084889771511067], [24.799193535274632]]
[None, [24.799193535274632]]
So that 'None', refers to a subset with a centroid that's None. I'm getting closer.
OK, so now I've tracked it down to one massive subset, whose centroid is 'None'.
The subset has 185K sites (!), the first few of which are: 0, 1, 2, 3, 4, 5, 9, 10, 15, 16, 17, 25, 30, 33, 34, 41, 42, 44, 45, 46, 47, 48, 49, 52
What's going on here is that these are sites with no variation. They either have all constant sites, or constant sites and some gaps, or all gaps.
So, I think a sensible workaround here is to just put in a fix where if a subset's centroid is 'None', we call it zero and move on.
Any thoughts @pbfrandsen
Hi Rob,
This looks like the right solution to me. I just encountered this error on a dataset that I was running overnight, so nice fix!
Paul
On Wednesday, August 12, 2015, roblanf notifications@github.com wrote:
OK, so now I've tracked it down to one massive subset, whose centroid is 'None'.
The subset has 185K sites (!), the first few of which are: 0, 1, 2, 3, 4, 5, 9, 10, 15, 16, 17, 25, 30, 33, 34, 41, 42, 44, 45, 46, 47, 48, 49, 52
What's going on here is that these are sites with no variation. They either have all constant sites, or constant sites and some gaps, or all gaps.
So, I think a sensible workaround here is to just put in a fix where if a subset's centroid is 'None', we call it zero and move on.
Any thoughts @pbfrandsen https://github.com/pbfrandsen
— Reply to this email directly or view it on GitHub https://github.com/brettc/partitionfinder/issues/61#issuecomment-130277067 .
Sent from my iPad
Karen M found this issue with a dataset (which she provided by email so we can debug):
Something to do with fabricating subsets for the kmeans algorithm.