This PR refactors the Breakpoints class. Internally, it will now use binary search (via np.searchsorted()) when outputting an array suitable for transform. I also created a new set of methods (encode() and recode()) which can be used within the Breakpoints class to encode the population labels as integers or vice versa. There is also a new write() method to help with writing breakpoints files.
In the process of doing all of this, I may have discovered a bug with the old population_array() method! I didn't catch it before because my tests were written incorrectly. This PR fixes both the bug and the tests.
In addition, I added a bunch of documentation for the Breakpoints class within the API docs for the data module.
Future work
We may want to consider storing everything in a flat numpy array instead of having a dictionary of them. The current data structure for the data property kinda complicates things and makes it harder to broadcast operations. Instead, we could consider using a single array but where sample and strand are additional fields in the array.
Note: this PR must be merged after #122
Overview
This PR refactors the Breakpoints class. Internally, it will now use binary search (via
np.searchsorted()
) when outputting an array suitable for transform. I also created a new set of methods (encode()
andrecode()
) which can be used within the Breakpoints class to encode the population labels as integers or vice versa. There is also a newwrite()
method to help with writing breakpoints files.In the process of doing all of this, I may have discovered a bug with the old
population_array()
method! I didn't catch it before because my tests were written incorrectly. This PR fixes both the bug and the tests.In addition, I added a bunch of documentation for the Breakpoints class within the API docs for the data module.
Future work
We may want to consider storing everything in a flat numpy array instead of having a dictionary of them. The current data structure for the
data
property kinda complicates things and makes it harder to broadcast operations. Instead, we could consider using a single array but wheresample
andstrand
are additional fields in the array.