Closed RubendeBruin closed 1 year ago
Thanks a lot, you are absolutely right that the implementation leaves a lot to be desired in regard to speed. This is a great proof of principle, thank you so much for doing this work!
The way I was thinking to move forward, is to use vectorized overlap functions that use numpy (from https://github.com/open2c/bioframe, such as https://github.com/open2c/bioframe/blob/0508270bac18ef15bdea9535d9126c958d6c9952/bioframe/arrops.py#L269), and not use any special objects or classes for bboxes - just extract their coordinates once, operate on them, and then create/move the text objects accordingly. I think that would be the most sustainable solution (i.e. I don't know any C/C++ to work with an extension), but would require a little refactoring. I haven't tried to implement it yet.
Agree with @RubendeBruin that adjustText is a much-needed function for matplotlib plots. The current implementation is prohibitively slow even for modestly sized data sets though. @Phlya , what would be some ways people could assist? Are you accepting pull requests and are you looking for help on this issue? Do you have an estimated timeline for when the speed-ups that you mention might be released? Thanks!!
@dpdoughe I agree a speed up is needed. Unfortunately, I don't have the capacity to work on this myself at the moment, and I am happy to accept pull requests.
My preferred way forward for this is outlined above, I am pretty sure it will provide sufficient speed, but obviously needs to be tested: moving away from using bboxes for overlaps and re-writing using vectorized methods with numpy
(and/or pandas
, if needed), for example the arrayops
from bioframe
(relevant functions could be copied in and modified if necessary, to avoid a dependency on an otherwise irrelevant library).
FYI I finally made myself make time for this, and implemented a new much faster engine using bioframe and pure numpy operations for all overlaps and movements - like I describe above. Feel free to try it and report any issues :)
adjustText is a much-needed function for matplotlib plots. I've noticed that adjustText is quite slow unfortunately. Using the profiler is seems that a lot of the time delay is caused by the use of matplotlib functions to determine the overlap of the boxes (the intersect function as well as xmin, xmax, ymin and ymax).
I've done a test with de-coupling the optimization algorithm from matplotlib as follows:
step 1: get the bounding boxes and positions from matplotlib and store them locally in objects
step 2: optimize using the objects
step 3: put the optimized positions back to the matplotlib text object.
I made an own implementation of a adjustment algorithm (cause I wanted to play a bit with that as well). The result still has some artifacts and should definitely be further optimized, but it does show the potential speed increase due to separating the code in the optimization from matplotlib.
My next step would be to fine-tune the algorithm and then implement it in C++ and compile it to a python module. For a faster implementation in pure python I think vectorizing the optimization using numpy would be the way to go.
code: