PyTorchification of transducer

CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning

Apache License 2.0

30 stars 17 forks source link

PyTorchification of transducer #4

Open kylebgorman opened 1 year ago

kylebgorman commented 1 year ago

[copied from CUNY-CL/abstractness/issues/123]

There are lot of pure Python loops in the transducer implementation and many can be replaced with PyTorch functions.

Adamits commented 1 year ago

PTL has a doc on "Finding bottlenecks in your code".

I wonder if this one in particular could be useful here: https://pytorch-lightning.readthedocs.io/en/stable/tuning/profiler_intermediate.html

bonham79 commented 1 year ago

Thanks @Adamits , I used the profiler and the major bottleneck is the expert functions. Since this is gonna require a bit of a major overhaul, I'm gonna just leave this as a note to table until time permits a deep dive.

Adamits commented 1 year ago

I was thinking about how edit distance could be a possible bottleneck and wondering how speech people do it, given the importance of WER in ASR codebases.

I found https://pytorch.org/audio/main/generated/torchaudio.functional.edit_distance.html -- could be useful for us?

EDIT: It looks like that code just loops in python so maybe not. Maybe we want a library in C or cython though, like https://pypi.org/project/editdistance/.

kylebgorman commented 1 year ago

Yeah short answer: write it in C++, thinking about cache locality, and wrap and expose to Python.

bonham79 commented 1 year ago

Yeah, a nice little C++ module would make the edit distance calculation quicker. However, the main bottleneck is more the oracle itself: you have to continually update the position in the edit, which requires transferring the predicted edit action to cpu for each edit. This gpu -> cpu communication is just such a killer until most of the oracle operations can be made tensor operations.