brettc / partitionfinder

PartitionFinder discovers optimal partitioning schemes for DNA sequences.
Other
60 stars 42 forks source link

--rcluster-max default #130

Closed carloliveros closed 5 years ago

carloliveros commented 5 years ago

I ran PF2 with the rclusterf search mode for a dataset with 12K subsets without specifying --rcluster-max and the program set rcluster-max to 120K, which is correct according to the description of the command line option in the manual (the larger of 1000 and 10 times the number of data blocks). However, the discussion of rcluster-max and rcluster-percent in the manual gives me the impression that the default should be the "minimum" of 1000 and 10 times the number of data blocks. Easy to fix my run by re-running and specifying rcluster-max = 1000 in the command line but the program does not seem to behave as described in the manual.

carloliveros commented 5 years ago

The current default behavior seems to be correct. I let PF2 run without specifying rcluster-max in the command line and it was suggesting schemes with 9K and 7K subsets in steps 1 and 2, respectively, and it is still going. Running PF2 with rcluster-max set at 1000 or 50 in the command line ends up with a final scheme with 12K subsets, hundreds less, but still too many. This program needs to be sped up though, using MPI parallelization, for use with large datasets because my PF2 run is now on its fourth day and will probably run for at least a few more; whereas running unpartitioned ML analysis with bootstrapping on the same dataset took only a little more than 2 days with ExaML.