Open amoghskanda opened 5 months ago
I encountered the same issue, the problem was solved after checking out the latest commit (faba099e0feb11ea0089490a5e87565e25bc4a2c) and re-generate data.
By the way if anyone encountered TypeError: __init__() takes 1 positional argument but 2 were given
, just replace @torch.no_grad
with with torch.no_grad():
in nr3d_lib/models/fields/nerf/lotd_nerf.py
:
# @torch.no_grad
def query_density(self, x: torch.Tensor):
with torch.no_grad():
# NOTE: x must be in range [-1,1]
...
@zzzxxxttt thank you for the reply. The key error persists. The problem is with the scenario.pt file as scenario['metas'] has no key under the name 'frame_timestamps'. Can you upload your scenario.pt file? This is for seg100613
@amoghskanda sure, here it is scenario.zip
Thank you for the scenario.pt file. @zzzxxxttt did you face the below error?
init() got an unexpected keyword argument 'fn_type' Line 183, train.py, MonoDepthLoss takes different parameters which are missing in the init of the class, defined in app/loss/mono.py class MonoDepthLoss
I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in withmask_nolidar.240219.yaml
for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained?
@ventusff @zzzxxxttt
Thank you for the scenario.pt file. @zzzxxxttt did you face the below error?
init() got an unexpected keyword argument 'fn_type' Line 183, train.py, MonoDepthLoss takes different parameters which are missing in the init of the class, defined in app/loss/mono.py class MonoDepthLoss
No, I didn't met this error
I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in
withmask_nolidar.240219.yaml
for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained? @ventusff @zzzxxxttt
I also use withmask_nolidar.240219.yaml
and only modified the data location, I can train it on my 12G memory RTX3060 without error.
so your data is loaded onto cache right? you did not make any changes when it comes to which device data and model are getting loaded onto? I have rtx3090 and data is loaded onto cpu, I run into RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) preload_on_gpu is false in withmask.yaml(by default) I did not make any changes as to which device
I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in
withmask_nolidar.240219.yaml
for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained? @ventusff @zzzxxxtttI also use
withmask_nolidar.240219.yaml
and only modified the data location, I can train it on my 12G memory RTX3060 without error.
Hi, I also try to use withmask_nolidar.240219.yaml, but got an error when loading the images to make ImagePatchDataset. Have you met this error and how did you solve it? Thanks!
yes, I removed **kwargs as an argument when calling get_frame_weights_uniform(), Line 66 dataloader/sampler.py
because that function, defined later, takes only 2 arguments.
frame_weights = get_frame_weights_uniform(scene_loader, scene_weights)
yes, I removed **kwargs as an argument when calling get_frame_weights_uniform(), Line 66
dataloader/sampler.py
because that function, defined later, takes only 2 arguments.frame_weights = get_frame_weights_uniform(scene_loader, scene_weights)
Thank you for the reply, and I met a new error like this. Have you met this before?
yes. I tried caching on gpu instead of cpu and changed the value of n_frames in the configs file from 163 to 30, for seg-10061, encountered the above error. When I reverted it to default settings(cache on cpu and 163), ran into #51
I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in
withmask_nolidar.240219.yaml
for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained? @ventusff @zzzxxxtttI also use
withmask_nolidar.240219.yaml
and only modified the data location, I can train it on my 12G memory RTX3060 without error.
cache is on the cpu right. The tensors frame_ind,h,w are on cpu as well. _ret_image_raw is on cpu as well. Not sure why I'm facing #51
yes. I tried caching on gpu instead of cpu and changed the value of n_frames in the configs file from 163 to 30, for seg-10061, encountered the above error. When I reverted it to default settings(cache on cpu and 163), ran into #51
Ok, have you solved the problem?
not yet, on it. Try training without changing the size of n_frames from the config file. Lmk if you run into the same issue as me
not yet, on it. Try training without changing the size of n_frames from the config file. Lmk if you run into the same issue as me
Sorry, I'm trying to run code_multi, but got the error like this, have you met this before?
@sonnefred , I used another config(with mask with lidar) and was able to train and render as well
@zzzxxxttt did you try rendering nvs with different nvs paths like spherical_spiral or small_circle?
@sonnefred , I used another config(with mask with lidar) and was able to train and render as well
ok, thank you, but I'd like to use monodepth supervision, still working on it ...
I made some changes to mono.py and used MonoSDFDepthLoss and somewhat fixed it. I'm getting a RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu). This is because the cache is loaded on the cpu and everything else on gpu(cuda:0). Is there a fix to this? I preloaded cache onto gpu(RTX3090) but then it runs out of memory. I reduced n_frames in
withmask_nolidar.240219.yaml
for segment-100613 from 163 to 30, able to load cache camera onto gpu, I run into RuntimeError: The size of tensor a (65536) must match the size of tensor b (256) at non-singleton dimension 1. What was the batchsize when you trained? @ventusff @zzzxxxtttI also use
withmask_nolidar.240219.yaml
and only modified the data location, I can train it on my 12G memory RTX3060 without error.
@zzzxxxttt Hi, how do you run this exp successfully? I still met a CUDA error when using this ymal ... Could you give any help? Thanks.
2024-06-11 19:16:01,146-rk0-train.py#959:=> Start loading data, for experiment: logs/streetsurf/seg100613.nomask_withlidar_exp1 2024-06-11 19:16:01,146-rk0-base.py#88:=> Caching data to device=cpu... 2024-06-11 19:16:01,146-rk0-base.py#95:=> Caching camera data... Caching cameras...: 0%| | 0/3 [00:00<?, ?it/s] Process finished with exit code 137 (interrupted by signal 9:SIGKILL)
Has anyone encountered this error before, and how can I adjust the parameters to make it run on my GTX 1660 Ti graphics card?
Firstly, great work and thanks for making it open-source. I setup everything following the readme for both streetsurf and nr3d. I wanted to use the withmask_nolidar.240219.yaml config file, made the path and sequence change to use seg100613(quick downloaded from streetsurf repo). The scenario.pt file is incomplete as
waymo_dataset.py
is accessing frame_timestamps(line 406) which is not a valid key in the scenario dictionary. There's another key error - line506waymo_dataset.py
, no global_timestamps key in the scenario['observers']['ego_car']['data'] dictionary. Can you share the complete scenario.pt file? or the zip file to segment-13476374534576730229_240_000_260_000_with_camera_labels sequence?