Open kylebgorman opened 1 year ago
PTL has a doc on "Finding bottlenecks in your code".
I wonder if this one in particular could be useful here: https://pytorch-lightning.readthedocs.io/en/stable/tuning/profiler_intermediate.html
Thanks @Adamits , I used the profiler and the major bottleneck is the expert
functions. Since this is gonna require a bit of a major overhaul, I'm gonna just leave this as a note to table until time permits a deep dive.
I was thinking about how edit distance could be a possible bottleneck and wondering how speech people do it, given the importance of WER in ASR codebases.
I found https://pytorch.org/audio/main/generated/torchaudio.functional.edit_distance.html -- could be useful for us?
EDIT: It looks like that code just loops in python so maybe not. Maybe we want a library in C or cython though, like https://pypi.org/project/editdistance/.
Yeah short answer: write it in C++, thinking about cache locality, and wrap and expose to Python.
Yeah, a nice little C++ module would make the edit distance calculation quicker. However, the main bottleneck is more the oracle itself: you have to continually update the position in the edit, which requires transferring the predicted edit action to cpu for each edit. This gpu -> cpu communication is just such a killer until most of the oracle operations can be made tensor operations.
[copied from CUNY-CL/abstractness/issues/123]
There are lot of pure Python loops in the transducer implementation and many can be replaced with PyTorch functions.