To see if there was an issue with the data I cut out the outlier mbs1708 000-00---0-------------0000000------------------
and let it run again with 24 taxa
24 taxas
Outlier mbs1708 000-00---0-------------0000000------------------ cut out
0.50h
This significantly improved the CPU time. Thus, we have concluded that the taxas should be chosen to have as few “-“ and “?” entries as possible. Still, there seems to be a hard boundary around 30-40 taxas, at least for our computers.
To test this, I randomly generated 30 48-character long taxas only consisting of 0s and 1s twice, which I ended after 2h and 3h run-time. I assume that the data was too random to find relations.
With Andreas help I also tested my matrix. We wanted to test if applying stemmatology to the images would be possible. My matrix consists of 184 taxas each 48 characters long. The spreadsheet is available here (https://docs.google.com/spreadsheets/d/19lqFXWLqEgoUZrQC_IoGIMFPfZdjddOKfPmorfIft_8/edit#gid=0) and the nexus file in the Stemmatology Illustration folder.
CPU time to create a tree
With 10 taxas 0.01 sec
With 20 taxas 2.22 sec Retained 989 trees
With 25 taxas 1h29
30 taxas 5.4% progress after +2h
With 40 taxas 0.00% progress after 1h of runtime
To see if there was an issue with the data I cut out the outlier mbs1708 000-00---0-------------0000000------------------ and let it run again with 24 taxa
24 taxas Outlier mbs1708 000-00---0-------------0000000------------------ cut out 0.50h
This significantly improved the CPU time. Thus, we have concluded that the taxas should be chosen to have as few “-“ and “?” entries as possible. Still, there seems to be a hard boundary around 30-40 taxas, at least for our computers.
To test this, I randomly generated 30 48-character long taxas only consisting of 0s and 1s twice, which I ended after 2h and 3h run-time. I assume that the data was too random to find relations.
30 clean taxas 0.00% after 2h runtime
30 clean taxas 0.00% after 3h15 runtime