Closed shenjiyuan123 closed 1 year ago
Hi @shenjiyuan123. Thanks for your interest in this project!
It may result from the data index mismatch between saved configuration files and the running dataloader sampler.
Please check if --fkd-seed
is set to the same value in relabel
and train
files.
Thanks for your answer. But I keep the --fkd-seed
to 42 all the time. I think it may not be this reason?
self.img2batch_idx_list
consists of [dict(),dict(),...]
and is generated at
https://github.com/VILA-Lab/SRe2L/blob/549988a9a7062eec56d5e8aa12187a60b1a798fb/relabel/utils_fkd.py#L216
Error Info KeyError: 7542
you provided means that 7542
does not correspond to any index of the first value in batch lists using in relabel
phase, which is the index mismatch I mentioned above. And --batch-size
should be set to the same in two phases to avoid the mismatch issue.
Do you keep other settings the same as the example bash in README.md?
Thanks for your patience. I have found my problem. I forget to change the num_img
to my setting since I use ipc=10
during the recover process. So sorry for the disturbance~
But, maybe a little suggestion: I think you can add a args.ipc
so that can control the get_img2batch_idx_list
function rather than changing the values directly in the function.
Btw, does the code of recover process support multiple GPUs? I see the implementation is using DataParallel
, however, when I try to use two GPUs to synthesize the data, it says that tensors are not in the same device like the following:
Traceback (most recent call last):
File "/export/home2/jiyuan/SRe2L/recover/data_synthesis.py", line 219, in
Thank you again!
Thanks for your suggestions. We have updated the code for the new features:
num_img
will be calculated automatically, then be passed into get_img2batch_idx_list
function--batch-size
and --epochs
to avoid some potential mismatch issuesFor recover
phase, the code works well in a single GPU. And the code supporting multiple GPUs will be released soon.
If you want to make the best of your two GPUs, you can assign one task to each GPU under different ipc_id
range settings to generate images with different IDs at the same time.
Thanks for your patience! You really help me a lot~ Hope every going well in your research.
Hi, I really think it's a great work!
However, I meet some problems when I try to reproduce your method.
I have successfully run the recover and relabel process. I generate the syn_data and the soft label (i.e. many files like batch_0.tar...). When I want to run the train.sh (I already change the pytorch source code following your instruction), it says that "
Caught KeyError in DataLoader worker process 0
". I find it doesn't find the correspondingimg_idx
in theimg2batch_idx_list
(relabel/utils_fkd.py line143).The error is following:
Epoch: 0 Traceback (most recent call last): File "/export/home2/jiyuan/SRe2L/train/train_FKD.py", line 360, in
main()
File "/export/home2/jiyuan/SRe2L/train/train_FKD.py", line 179, in main
train(model, args, epoch)
File "/export/home2/jiyuan/SRe2L/train/train_FKD.py", line 219, in train
for batch_idx, batch_data in enumerate(args.train_loader):
File "/export/home2/jiyuan/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/export/home2/jiyuan/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/export/home2/jiyuan/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/export/home2/jiyuan/anaconda3/lib/python3.9/site-packages/torch/_utils.py", line 644, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/export/home2/jiyuan/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/export/home2/jiyuan/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 62, in fetch
mix_index, mix_lam, mix_bbox, soft_label = self.dataset.load_batch_config(possibly_batched_index[0])
File "/export/home2/jiyuan/SRe2L/train/../relabel/utils_fkd.py", line 143, in load_batch_config
batch_idx = self.img2batch_idx_list[self.epoch][img_idx]
KeyError: 7542
Could you help me figure it out? Hope for your feedback!
Thanks.