Improve sample loading time

Al12rs commented 2 years ago

When profiling scripts that take more than a few minutes, austin can generate GB of profiling data. I'm using austin to profile a heavy algorithm for machine learning explainability, so lots of computation which can take several minutes to complete.

When I try to load a 300MB sample from the flamegraph view, I get "crunching the numbers..." for 1+ minutes. When I try to load anything over 1GB, it's a coin toss on whether it will ever complete at all.

I was hoping the loading time could be improved as I believe it should be possible to parse the file in less time than that.

I find the in Editor timing annotations that this extension provides invaluable during profiling and I haven't been able to find anything else providing them for python, but the loading time is revealing itself to be prohibitive for any longer running script.

I tried looking at the code of the extension to see if something stood out to me, but after a while I concluded it would be best to profile the loading code instead and perform informed decisions using that data.

I believe VS code might offer something to profile VS Code extensions, but I'm not certain, I was hoping you would know more.

P403n1x87 commented 2 years ago

@Al12rs thanks for sharing your experience. Indeed the performance of the extension is not the best, and there are a few factors at play here. There are a few things that could be attempted to mitigate this issue. One would be to use the austin-compress tool from the austin-python package (pipx install austin-python) to aggregate the collapsed stacks and reduce the file size. This should speed up the parsing done by the extension. Alternatively, you could try:

increasing the sampling interval to collect less data (this will give you the bigger picture view)
use the exposure option to only sample for a few seconds (this would "work" if your script doesn't go through different phases, which is probably unlikely for an ML model if the run involves training, testing, tuning, etc...)

As for the file size, the latest release of Austin provides a binary output mode which would reduce the file size considerably, but I don't think this would solve the VS code parsing issue at this stage.

Anyway I'll see if I can improve things from this point of view in the extension! :pray:

Al12rs commented 2 years ago

Hi, thanks for the quick response!

I tested out austin-python and it appears to be doing a decent job (300MB -> 24MB, ~10s to load). Loading performance still isn't great but at least it seems manageable.

Could a compression step not be part of austin itself? perhaps as an optional flag?

I had some incompatibilities issues with the pip package, as it required different versions of the same dependencies (protobuf) as other packages I need (ray, tensorflow etc). I can workaround it by using a separate venv for it, though it the entire thing isn't the most convenient.

Regarding the other suggestions, as you guessed the algorithm is pretty diverse so windowing is hard, and the loss of precision is making it hard particular parts of the code. For some parts I had to resort to line_profiler to profile individual functions and give up on the in Editor hints, overall a clunky and awkward workflow.

I would suggest either making asutin generate the compressed format, either through a flag or by default, or finding a way to improve the vs code extension performance. Probably the performance improvement would be great to have in either case.

P403n1x87 commented 2 years ago

Could a compression step not be part of austin itself? perhaps as an optional flag?

There is nothing preventing this feature to be added, except for performance. To perform this kind of compression at runtime would require more resources that would lower Austin maximum sampling rate. So Austin does the bare minimum to dump the data out by design, to increase sample throughput to the very extreme whilst using a single core.

I had some incompatibilities issues with the pip package, as it required different versions of the same dependencies (protobuf) as other packages I need (ray, tensorflow etc). I can workaround it by using a separate venv for it, though it the entire thing isn't the most convenient.

I think the best way to install austin-python when you just want to use the tools it provides is perhaps with pipx, which provides the dependency isolation of a venv out of the box :slightly_smiling_face:

I would suggest either making asutin generate the compressed format, either through a flag or by default, or finding a way to improve the vs code extension performance. Probably the performance improvement would be great to have in either case.

As argued above, I think it's unlikely we'll see the compression feature baked into Austin, which tries to embrace the Unix tool philosophy. I appreciate it this means adding an extra step to compress the data afterwards, but I believe this is for the best. So the only viable option is to improve the VS Code extension. One "trivial" improvement would probably be parallelization, as it should be possible to split the sample file across multiple worker threads and aggregate at the end (a map-reduce effectively).

Al12rs commented 2 years ago

Thank you for suggesting pipx, I didn't know about it and it is likely to make my life easier, much obliged!

I understand about the choice for austin, though the way I was envisioning it was to perform the compression only after the program had completed its run, as an optional post processing step. This way it would not compromise performance during the profiling, but instead add some processing delay at the end of the run. This behaviour would only happen when using a command line option like --compress or similar. This would also allow the option to be used with the VS Code integration, so that a script can be profiled, compressed and analysed by the extension, thus reducing the loading time (I'm pretty positive the compression time is lower than the loading time).

Though if you do not intend to include this directly, could I suggest to advertise the compression functionality of austin-python in the readme of austin? Just for the filesize advantage alone I would consider this to be pretty vital. I wasn't aware of the possibility of compressing before you mentioned it here.

Map reduce sounds good, though if you are able I would try to use a profiler before to make sure there isn't something causing outstanding delay, or some other issue like parsing the entire file multiple times. Depending on the situation, some application of async await could offer big speedups as well.

P403n1x87 / austin-vscode

Improve sample loading time #36