Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Other
1.49k stars 198 forks source link

Warm up need? #185

Closed auspicious3000 closed 4 years ago

auspicious3000 commented 4 years ago

I was running a simple onmt decoder using turbo similar to https://github.com/TurboNLP/Translate-Demo/.

The decoding strategy is to remove the finished sentence from the batch during step-by-step decoding.

It runs fine without the strategy and for the 1st time with strategy, but the second run with strategy always throws an error when it tries to access or use "is_alive".

original_batch_idx = original_batch_idx[is_alive] RuntimeError: CUDA error: device-side assert triggered

However, if I run without strategy before running with strategy, the error goes away. Therefore, I was suspecting if some kind of warm-up is needed for turbo decoding with strategy? If so, what would be the simplest way to warm up without actually running through the whole input?

Thank you very much!

feifeibear commented 4 years ago

Can you explain what the strategy means?

auspicious3000 commented 4 years ago

https://github.com/TurboNLP/Translate-Demo/blob/master/mytranslator.py#L681

https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/translate/greedy_search.py

My strategy is similar to the above but much simpler. It removes the finished sentence from the batch.

auspicious3000 commented 4 years ago

I found that after this map_state operation https://github.com/TurboNLP/Translate-Demo/blob/master/mytranslator.py#L772, accessing the variable "is_alive" will trigger the error.

auspicious3000 commented 4 years ago

Problem solved.