Hello, during the process of using the above configuration for GPU inference, I encountered a problem of memory explosion, which is very different from my theoretical calculation of memory usage. How is the memory usage calculated in the forward inference process, or is there any other problem that caused the memory explosion?
Hello, during the process of using the above configuration for GPU inference, I encountered a problem of memory explosion, which is very different from my theoretical calculation of memory usage. How is the memory usage calculated in the forward inference process, or is there any other problem that caused the memory explosion?