Speech-to-text evaluation

danliu2 / caat

MIT License

34 stars 2 forks source link

Speech-to-text evaluation #8

Closed sarapapi closed 2 years ago

sarapapi commented 2 years ago

Hi, I successfully run the experiments and I am trying to evaluate the CAAT system with various values of latency. I started your agent for SimulEval but it is taking more than 2 days for the result, is it normal? Thank you again

danliu2 commented 2 years ago

Seems to be much slower than in my experiment. The infer speed in my experiment was roughly 1 realtime （infering 10s audio cost 10s computation）. ~uah, how many samples in your experiment? Here are some tips about speed, hope to be helpful.

It is much slower than Fairseq offline translation. Because Simuleval tests simultaneous translation sample by sample, batch processing cannot be performed.
Check whether the GPU is used properly.

sarapapi commented 2 years ago

Hi, I am testing on MuST-C tst-COMMON and it is taking a long time compared to you. I do not know why. I am using only one GPU K80 for inference, how many GPUs have you used for testing? I also added the tag --gpu to the SimulEval inference script to ensure the use of the GPU. It is strange because the standard Fairseq SimulST model takes less than 1 hour in the same setting.

danliu2 commented 2 years ago

I use one V100 GPU for inference, but there shouldn't be too much difference in speed. the tag "--gpu" is not defined in my code and I will ignore it. I used "--cpu" to indicate model runs on CPU, or it will runs on GPU by nn.Module.cuda().

Infering speed of my CAAT should be comparable to wait-k or MMA, I don't know the detail of your standard fairseq simulST. Maybe you need to debug a little bit more, log out the cost of some special function, to find the bug reason