Closed jiwidi closed 3 years ago
Loss 0.0 means that there's something wrong with rnnt installation 😢
Loss 0.0 means that there's something wrong with rnnt installation 😢
Yeah thought so too, do you have any tips on how to check the rnnt installation? Some sample code that I can work with to debug it. Installation script runs fine tho
@jiwidi Unfortunately, the repo warp-transducer seems to be abandoned, the test script in that repo is outdated too, so we have to implement a new one to debug and test, otherwise I don't see any other options.
Hard to tell without seeing the CMakeList but I suspect that you are using CUDA 11. The CMakeFile probably doesn't compile for the latest architecture. I would add the appropriate SM there first before starting to debug the src itself.
if you run CUDA 11.1 you can use arch and sm for the RTX cards -arch=sm_80 \ -gencode=arch=compute_80,code=sm_80 \ -gencode=arch=compute_86,code=sm_86 \ -gencode=arch=compute_86,code=compute_86
for cuda 11 use: -arch=sm_52 -gencode=arch=compute_80,code=sm_80 \ -gencode=arch=compute_86,code=sm_86 \ -gencode=arch=compute_86,code=compute_86
did this suggestion fix the issue?
did this suggestion fix the issue?
Hi! Sorry I have been out during the week and couldnt try the solution, will try it this weekend and get back to you. Thanks!
did this suggestion fix the issue?
Hi again! It got fixed and it have a loss !=0, gpu usage during training is very low 5% is that normal for this model? Takes 6secs per batch on the 3090. Maybe is still not running on cpu.
The rnnt transducer loss is installed with cuda found and when running the example it outputs that is running on gpu so it shouldnt be wronly running on cpu.
@jiwidi no that's not normal, can you use the profiler to log the training performance?
@jiwidi no that's not normal, can you use the profiler to log the training performance?
Yeah, will do later today, any tips using the profiler on your library?
@usimarit So its being a bit hard to find exactly where to put my profiling code. I can't really debug your structure of classes. From looking at the train_conformer.py
example I thought the train step that was running was the function here https://github.com/TensorSpeech/TensorFlowASR/blob/e08e208f90ccc82d47751a05ce22b7d0ec78f685/tensorflow_asr/runners/transducer_runners.py#L48.
But is not, I replaced it with the following function to show the profiling:
@tf.function(experimental_relax_shapes=True)
def _train_step(self, batch):
with profiler.profile(record_shapes=True) as prof:
with profiler.record_function("model_inference"):
_, features, input_length, labels, label_length, pred_inp = batch
with tf.GradientTape() as tape:
logits = self.model([features, pred_inp], training=True)
tape.watch(logits)
per_train_loss = rnnt_loss(
logits=logits, labels=labels, label_length=label_length,
logit_length=(input_length // self.model.time_reduction_factor),
blank=self.text_featurizer.blank
)
train_loss = tf.nn.compute_average_loss(per_train_loss,
global_batch_size=self.global_batch_size)
gradients = tape.gradient(train_loss, self.model.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
self.train_metrics["transducer_loss"].update_state(per_train_loss)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
And get no print. Could you point me to where I should debug with the profiling? where can I profile every batch that is run by the example.
Thanks!
Did you rebuild the package after your code changes?
I have tried this on a RTX 3090 myself in the meantime. There are some issues in the warpRnnt that needs to be addressed as discussed before but then it is working. I have compiled under CUDA 11.1 with the latest TensorFlow that supports CUDA 11.1 and SM86. I'm looking at > 5 batches / second with a batch size of 6. No GA.
I'll close this issue here since I think the rnnt loss implementation in tf can drop our dependency in gpu devices and cuda versions and leave it to tensorflow. Therefore this problem is solved. Feel free to reopen or open new issue if this problem occurs again (for rnnt loss in tf only, wrap rnnt loss is deprecated)
Hi!
First of all, very nice repository you have. Great work, I like your work.
I've been trying to run your example for the conformer with a rtx 3090 from the new nvidia series and I was wondering if its something you have tried/tested or even support.
Im running cuda 11.1 and cudnn cudnn-11.1-v8.0.5.39 and I tried running your installation commands with conda:
Then install the rnnt_loss with
And got this output
It looked like the warp-transducer wont compile so following this post I commented the following lines of the warp-transducer cmake file:
And I was able to compile it. After this I tried running the example
python examples/conformer/train_conformer.py
but it wont start running due to a gpu error:So I upgraded tensorflow to
pip install tf-nightly-gpu==2.5.0.dev20201028
and solved it. Now im able to run the code in the example script but I have loss equal to 0 and I wonder if this is something normal or could be a bug from my installationIs this related to the tf version or the warp-transducer version? Has anyone run examples from this repository with the new nvidia 3000 cards? Could you provide me with some information about your installation?
Here is the full output from my execution of the conformer example:
Thanks