-
Thanks for your code!
Here when I use 0.4 branch, refinedet_train_test.py to train
My platform is ubuntu1604, python3.6, pytorch0.4.1, 1080ti
I got following bug log **WHEN I set batch size >=8**:
…
-
Hello, I found that during the pre-training process, the memory occupied keeps increasing in the iteration process, I want to know why this is, is the same for your training process and how much memor…
-
Setup
- Environment: Pytorch 2.3.0, composer 0.22.0, streaming 0.7.4
- GPU: 8xH100 sxm, BF16 mode
This issue is related #643 but concerns a more subtle issue with Streaming datasets. Over the cou…
-
您好,
我在复现您的代码时,出现了一些bug,但我不知道如何去调整,望您从百忙之中,抽空看一下,感谢您。
我在执行Training memory bank model/train_singlenet_phase_1fc.py时出现的bug如图所示:
![bug1](https://user-images.githubusercontent.com/71928265/142859607-c1c…
-
It costs about 130G memory training model but when to save model it occurs oom error(total memory 195G). Is there extra memory allocation when saving the model?
-
#CUDA error out of memory despite having 20 GB GPU
-
### Prerequisite
- [X] I have searched [Issues](https://github.com/open-mmlab/mmpose/issues) and [Discussions](https://github.com/open-mmlab/mmpose/discussions) but cannot get the expected help.
- [X…
-
I couldn't find the code for generating static node memory. In the paper, it mentions that "we use learnable node embeddings pre-trained with the same task as the static node memory due to its simplic…
-
I'm planning to apply rwkv in my ASR model, but once I use rwkv's module it generates this error and only after the program has been trained for some time, is there any related solution idea or soluti…
-
thanks for providing your code , it's much more readable than the original one.
i have observed severe memory leak in training .
During training , former allocated batch tensors of smaller net…