Closed sarapapi closed 2 years ago
For some, to me unknown, reason I solved copying the warp_transducer/src/attent_entrypoint.cu in the same folder with the name attent_entrypoint.cpp, I compiled again with cmake and everything works.
I had the same problem, tried to copy the warp_transducer/src/attent_entrypoint.cu file and recompile, the bug was not resolved. Is install the original warprnnt library is necessary? @sarapapi @danliu2
Sorry for replying so late. You mustn't install raw warp-rnnt, for my code share the same name with it, which may cause version conflict in python packages. According to the error message you gave, it looks like you are actually using the original version of warp-rnnt, which does not provide the interface I added (get_delay_workspace_size etc.). Please check the .so position and functions exported (by dumpobj etc) to ensure it, test_delay tool . Hope it to be solved smoothly and apologize for my rough code.
Sorry for replying so late. You mustn't install raw warp-rnnt, for my code share the same name with it, which may cause version conflict in python packages. According to the error message you gave, it looks like you are actually using the original version of warp-rnnt, which does not provide the interface I added (get_delay_workspace_size etc.). Please check the .so position and functions exported (by dumpobj etc) to ensure it, test_delay tool . Hope it to be solved smoothly and apologize for my rough code.
Thanks for your replying, I check the compiling details, reinstall the local cuda and solve the installation problem. But after training 80 epoch on mustc-v2 en-zh dataset, the delay loss and prob_loss are still 'nan', PPL and loss are still high (I skip mt distillation step and I use ASR pretraining in the example. ). Is there any possible solution? @danliu2
Thanks for your replying, I check the compiling details, reinstall the local cuda and solve the installation problem. But after training 80 epoch on mustc-v2 en-zh dataset, the delay loss and prob_loss are still 'nan', PPL and loss are still high (I skip mt distillation step and I use ASR pretraining in the example. ). Is there any possible solution? @danliu2
It should not be NAN at all. and you can get meaningful nll, which means the backward loss is not nan, or one step backward will turn all params to NAN. So, I guess, it is just some kind of exception where you get NAN and the model is not updated at that time, e.g. the sequence is short than one block and so on. My suggestion :
Dear authors, I have installed both Fairseq with your repository and your version of warprnnt but the following error occurs when I launch the training code:
Have you experienced something similar? I made some research online but I found nothing. Thank you
EDIT: I tried to install the original warprnnt library and everything works, I successfully import their modules without any error. Thus, I hypothesize that is something related to your modified version.