Closed joewale closed 2 years ago
Hi there,
Sorry but I am quite busy these days and won't be able to do that soon. But I think it should be quite straightforward to implement. It is of course welcome if you could create one and make a pull request.
-Yuan
I see. I am writing the demo for single audiofile. I will make a pull request later. Hi, Yuan, for the training and validation stage, only the target_length of the feature frames of every audiofile is input to the network, no matter how long the duration of audiofile is ?
Hi there,
@JeffC0628 just submit a pull request, maybe you can take a look?
I am not sure if I understand your question, in the training and validation stage, we do not input the target_length to the network (see here), instead, we initialize the network with a fix target_length, and cut/pad the audio into that length.
-Yuan
got it.
Hi,Yuan,I find the delay of prediciton for audiofile is large on cpu. It costs 14 seconds for the 3 minutes audiofile.
Hi there,
AST is not supposed to work with 3-minute audio files. You need to split into smaller chunks (e.g., 10s with some overlap), it should be reasonably fast for CPU inference, but of course it is better on GPUs.
-Yuan
I see, Thanks
Hi, Yuan. Is there the code or the demo to test the single audiofile with the trained model ?