Closed hsj928 closed 5 years ago
The length of chunk size is the number of GPUs. And the batch size is the sum of chunks size.
There is an new issue.
@hsj928
Add these codes above the
dist = tag_mean.unsqueeze(1) - tag_mean.unsqueeze(2)
in models/py_utils/kp_utils.py:
if len(tag_mean.size()) < 2:
tag_mean = tag_mean.unsqueeze(0)`
The length of chunk size is the number of GPUs. And the batch size is the sum of chunks size.
so if we use one GPU, we can only use chunk size of 1? I'm struggling to use more of the memory of my GPU in this way, even when I set the batch size really high
@Ostyk If you only have one GPU, modify the these codes in config/xxx.json: line4: "batch_size": xx, (where xx denotes the batch size in a GPU, it can be more than 1) line22: "chunk_sizes": [xx], (the 'chunk_sizes' is equal to the 'batch_size') I also recommend that you try the CPNDet (https://github.com/Duankaiwen/CPNDet). CPNDet is a version 2.0 of the CenterNet
Thanks for the quick answer. I'll check out the new network, and since it's also anchor free I can use it as the backbone just like CenterNet for FAIRMOT (reidentificiation).
still getting the error when I put batch_size == chunksizes
"batch_size": 20, "chunk_sizes": [20],
RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 20, but expected 1) (scatter at ..\torch\csrc\cuda\comm.cpp:176)
@Ostyk Can i see your full log?
`
Traceback (most recent call last):
File "train.py", line 210, in
I read that it might have something to do with nn.DataParallel
but couldn't figure it out. I'd be grateful for any tips :)