Closed cocoshe closed 2 years ago
And I just found the select_inds
are empty in some of my batchs.
Here's the log and debug print:
I have the same issue, but it only happens when I'm training with single_gpu.yaml
, I's okay when I train with adventure.yaml
. And it also happens very random, I got this error at the 6th epoch and there is already some progress Images like 'prog_000300.jpg' image generated successfully in previous epochs.
I have the same issue, but it only happens when I'm training with
single_gpu.yaml
, I's okay when I train withadventure.yaml
. And it also happens very random, I got this error at the 6th epoch and there is already some progress Images like 'prog_000300.jpg' image generated successfully in previous epochs.
Yes, it happens very random, and I also checked the frame_name
, and try to find the problem. But even when I prepare only 2 frames, the train is nothing wrong for some batchs at first, but then the input data is empty...I find select_inds
is empty, so it can't get sample rays, that's why the input data is empty, so the output data is empty.
And I only have one GPU, can I run the adventure.yaml
? Or you find out how to solve the single_gpu.yaml
problem?
I have the same issue, but it only happens when I'm training with
single_gpu.yaml
, I's okay when I train withadventure.yaml
. And it also happens very random, I got this error at the 6th epoch and there is already some progress Images like 'prog_000300.jpg' image generated successfully in previous epochs.
I just tried adventure.yaml
, but got the same error, still rgb key error
, maybe something wrong with my own dataset? Can you provide your dataset (maybe with google drive)? I stucked here for days and still can't figure out the reason, appreciate!
Well I'm just playing with the standard ZJU-Mocap dataset with subject 387. And I just find that if I'm training with my local desktop (which has two Quadro RTX 6000, 24GB, cuda 11.8), it never fail. And if Ilm training on remote server (which has two GTX 3090, 24GB, cuda 11.6), it will fail at some point. I really don't understand this either.
Well I'm just playing with the standard ZJU-Mocap dataset with subject 387. And I just find that if I'm training with my local desktop (which has two Quadro RTX 6000, 24GB, cuda 11.8), it never fail. And if Ilm training on remote server (which has two GTX 3090, 24GB, cuda 11.6), it will fail at some point. I really don't understand this either.
My computer gpu is AMD so I can't run it on my own computer. I run the code on the cloud server(Tesla V100, 32GB, cuda 11.0), then failed(rgb key error). Strange....I don't get it.
Well I'm just playing with the standard ZJU-Mocap dataset with subject 387. And I just find that if I'm training with my local desktop (which has two Quadro RTX 6000, 24GB, cuda 11.8), it never fail. And if Ilm training on remote server (which has two GTX 3090, 24GB, cuda 11.6), it will fail at some point. I really don't understand this either.
My computer gpu is AMD so I can't run it on my own computer. I run the code on the cloud server(Tesla V100, 32GB, cuda 11.0), then failed(rgb key error). Strange....I don't get it.
I have a walk-around about this issue. Edit the file humannerf/configs/config.py
, change line 13 to _C.resume = True
. Then when you run the training code, run like this:
for i in `seq 9999`;do python train.py --cfg XXXXX.yaml;done
This way, whenever the code break, it will resume from where it crashed.
Well I'm just playing with the standard ZJU-Mocap dataset with subject 387. And I just find that if I'm training with my local desktop (which has two Quadro RTX 6000, 24GB, cuda 11.8), it never fail. And if Ilm training on remote server (which has two GTX 3090, 24GB, cuda 11.6), it will fail at some point. I really don't understand this either.
My computer gpu is AMD so I can't run it on my own computer. I run the code on the cloud server(Tesla V100, 32GB, cuda 11.0), then failed(rgb key error). Strange....I don't get it.
I have a walk-around about this issue. Edit the file
humannerf/configs/config.py
, change line 13 to_C.resume = True
. Then when you run the training code, run like this:for i in `seq 9999`;do python train.py --cfg XXXXX.yaml;done
This way, whenever the code break, it will resume from where it crashed.
I don't understand. How to solve the problem completely and perfectly. Rerunint the code everytime when it break makes me feels not good and really strange....
Well I'm just playing with the standard ZJU-Mocap dataset with subject 387. And I just find that if I'm training with my local desktop (which has two Quadro RTX 6000, 24GB, cuda 11.8), it never fail. And if Ilm training on remote server (which has two GTX 3090, 24GB, cuda 11.6), it will fail at some point. I really don't understand this either.
My computer gpu is AMD so I can't run it on my own computer. I run the code on the cloud server(Tesla V100, 32GB, cuda 11.0), then failed(rgb key error). Strange....I don't get it.
I have a walk-around about this issue. Edit the file
humannerf/configs/config.py
, change line 13 to_C.resume = True
. Then when you run the training code, run like this:for i in `seq 9999`;do python train.py --cfg XXXXX.yaml;done
This way, whenever the code break, it will resume from where it crashed.
I tried on my local computer, but still got the "rgb key error"...
I got the same error when I trained the wild part. I noticed the error happened when the output images were poorly cropped. after I change the offset from 0.3 to 1.5, the error never happend again and output image cropped correctly.
@cocoshe Did you solve the problem ("KeyError: 'rgb")? I have the same issue when running train on single GPU.
@cocoshe Did you solve the problem ("KeyError: 'rgb")? I have the same issue when running train on single GPU.
Something wrong with the dataset preparation, I didn't figure out why. I remember there are two ways to prepare our own dataset:
Howevery, I didn't figure out why, and actually I nearly forgoet detail about it So if you are doing like 1, just try 2, or if you are following the 2, try 1.
So the problem was disappeared, wish you can figure out why
@cocoshe Did you solve the problem ("KeyError: 'rgb")? I have the same issue when running train on single GPU.
the key_error for rgb comes when the masked image during training, does not align with the camera values, thus failing to project the camera rays on the image. so it returned an empty ray_mask at the end.
I tried to process my data with a higher T value and got the rgb error but when I adjusted the T / 1000, the rendering started properly.
so make sure the alpha masks are well aligned with the images and the image centre of the masks are in order with the camera matrix(K) in case you resize the images.
hope this helps
In my case, this error was caused by opencv. OpenCV uses BGR format so you have to convert it to RGB before saving the image by running something like cv2.cvtColor(frame, cv2.COLOR_BGR2RGB).
In my case, this error was caused by opencv. OpenCV uses BGR format so you have to convert it to RGB before saving the image by running something like cv2.cvtColor(frame, cv2.COLOR_BGR2RGB).
There will be endless more reasons for the RGB error, one of which you will encounter a lot is when the camera rays miss the target returning null values
When I train for my own data, I run
python train.py --cfg configs/human_nerf/wild/monocular/single_gpu.yaml
with my terminal, here is the log:I tried to print some values for debug:
I just clone the project, and follow the readme:
mkdir
fordataset/wild/monocular
,dataset/wild/monocular
:But when I train, I failed.