casact / FASLR

Free Actuarial System for Loss Reserving
https://faslr.com
GNU General Public License v3.0
32 stars 11 forks source link

Slow performance on heatmap parser #57

Open genedan opened 2 years ago

genedan commented 2 years ago

The way the FASLR heatmap currently works is that it extracts the html representation returned from the chainladder heatmap, and then searches through the html text to find the colors applied to each cell. These colors are then applied to the view using one of its built-in methods.

I came across some string methods that can convert a string into something like a list which might end up going faster, so it could be something to look into.

genedan commented 1 year ago

Hey @odddkidout, I know you were asking for another issue, would you like to try this one? The heatmap is a bit slow, what it does is that it uses the beautiful soup package to extract the colors from the CSS generated from the pandas heatmap method:

image

I think for proof-of-concept type work it's fine, but improving the speed here will take things to the next level. My goal here is to get rid of the bs4, and possibly IPython dependencies which I think are a bit overblown for the current use case. Just let me know if this interests you.

odddkidout commented 1 year ago

Sounds cool. Do you mean parse html with string manipulation?

genedan commented 1 year ago

Yeah, here is how it is currently done:

https://github.com/casact/FASLR/blob/f35e13b910a1b5f832ad48c70517748d7f5c482c/faslr/utilities/style_parser.py

I think my use case is too simple to warrant having an entire package like bs4 as a dependency. I think what's slowing things down a whole bunch is the nested loops I have at the bottom, they just search through the CSS looking for the colors.