Error:
Training will fail when using larger batch:
SystemError: (Fatal) Operator set_value raises an thrust::system::system_error exception. The exception content is :parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument. (at /paddle/paddle/fluid/imperative/tracer.cc:192)
Error: Training will fail when using larger batch:
SystemError: (Fatal) Operator set_value raises an thrust::system::system_error exception. The exception content is :parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument. (at /paddle/paddle/fluid/imperative/tracer.cc:192)
Reason: The reason is explained by the following issues from PaddlePaddle: https://github.com/PaddlePaddle/Paddle/issues/33057#issuecomment-847719249
In short, this error is raised because of cuda thrust bug, which is ignored in newer version cuda.
Solution: install paddle dev version will fix the problem. You will find the following instructions of how to install it: https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html
In detail, the problem is fixed by the following patch: https://github.com/PaddlePaddle/Paddle/pull/33748/files/617e3eda9dfcd76cb6a7ebaa1535340f1023d3f1