AlvinYH / Faster-VoxelPose

Official implementation of Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
MIT License
154 stars 18 forks source link

Possible bugs? #16

Closed gpastal24 closed 1 year ago

gpastal24 commented 1 year ago

Hi , I have been trying to train the model. At first I tried the repo as is and I was getting this bug Screenshot from 2022-12-10 13-49-14

Then I removed the part that saves the imgs ,note that it works ok when evaluating, and I was getting killed cause of RAM when I had the default 10k samples. I reduced it to 1-5k and I managed to load them succesfully. If I understand correctly in the synthetic data experiments the whole dataset is loaded into RAM.

After that I was getting CUDA OOM, the culprit seemed to be https://github.com/AlvinYH/Faster-VoxelPose/blob/4daaedad466b9c95b1e9b35cfabd496b60e6013a/lib/core/function.py#L69-L70

so I removed that part.

The model seemed to train at first but then I was NaN tensor and the accumulated loss was nan as well. During validation using the debug_save_imgs function , actually works and I can see the predicted 3d poses and their projections. The thing is when I print the final_fused_poses variable, each joint is always [0,0,0,-1..]. The error for each actor is stuck at 0.0.

I have tried training both in an RTX3080 (where I cant use torch 1.4.0 since cuda 11 is the minimum version supported by RTX30xx series) and on colab where I installed the requirements from the requirements.txt file.

gpastal24 commented 1 year ago

the 4th element in the proposal centers is always -1 during testing

gpastal24 commented 1 year ago

Hi, so the save_debug_images had to do with the gradients after all, final poses , poses and proposal centers had to be detached first . The NAN tensor error seems to be less severe when selecting 5k samples. During testing the model still cant predict any poses and the error is stuck at 0 (logically).

The training is almost over and the predictions on the TRAINING set look like the following. Do they seem normal? train_00000000_2d train_00000200_2d train_00000500_2d train_00000600_2d

gpastal24 commented 1 year ago

Hello,it actually works on google colab with the modifications on the save_debug function now . Pretty weird if you ask me but ok!

leonhard-yu-zhang commented 1 year ago

Hello, could I ask some questions about the panoptic dataset? There is a TRAIN_LIST in the "lib/dataset/panoptic.py", which requires the sequences of "161202_haggling1". But I can't find "161202_haggling1" in the panoptic dataset.

gpastal24 commented 1 year ago

Hello, if you are referring to me, I trained the model with the synthetic data from panoptic dataset on campus setup. I only used 2k samples though since that was the maximum colab's session could handle, due to RAM limitations.

Taylorminer commented 1 year ago

Hi,gpastal24 ! Can you share your trained weights file ? I try to train this model on Campus dataset. But the program is always killed. Thanks ! @gpastal24

gpastal24 commented 1 year ago

Hi,you are running out of RAM most probably. I changed the way files and annotations are loaded into the RAM and I managed to train it this way. Nonetheless , you can download the Shelf weights from here.

You can try these on campus, it increases the error a bit but it is acceptable imo.

Taylorminer commented 1 year ago

Hi,you are running out of RAM most probably. I changed the way files and annotations are loaded into the RAM and I managed to train it this way. Nonetheless , you can download the Shelf weights from here.

You can try these on campus, it increases the error a bit but it is acceptable imo.

Thanks!Do you train the model on Campus dataset ? If yes, can you share it with me ? I want to test the model in a custom scene with 3 cameras, But there are 5 cameras in shelf dataset.

gpastal24 commented 1 year ago

Thanks!Do you train the model on Campus dataset ? If yes, can you share it with me ? I want to test the model in a custom scene with 3 cameras, But there are 5 cameras in shelf dataset.

I have trained it on Campus as well, but I cannot access the files atm. This should not be an issue, just load the pretrained model from Shelf dataset and you will be fine. Actually I have tried cross dataset metrics, I believe in campus dataset the error increases by 20mm (from 70-90 to 90-110, if memory serves me correct). I could upload the campus weights later if you wish, but I dont think you actually need them for your test.

leonhard-yu-zhang commented 1 year ago

Hi, have you tried to test your custom images on Faster-VoxelPose? I've used my own camera parameters. When I tried to test Faster-VoxelPose or VoxelPose (https://github.com/microsoft/voxelpose-pytorch), there is no output of the skeletons (no results about the 3d pose, xy, xz, yz projections). Thanks! @gpastal24 @Taylorminer

gpastal24 commented 1 year ago

@leonhard-yu-zhang Hi. I have run the panoptic model in real-time video capturing scenario. How is your camera calibrated. Do you obtain the Rw,Tw or Rc,Tc. In the second case Tc would be close to the camera position from your origin point. You need the Rw and Tw to test the model. In the _get_cam function you need to use the M matrix to get the dot product of your Rw with M. If your system is right handed y up, you should change the second row to 1 instead of -1. You then obtain the Tc by the -np.dot(R.T,Tw) operation in the same function. Note that T should be in mms. Lastly, the space center and space size parameters should be correctly assigned.

It is not clear to me why they use the campus parameters that they use on their config file, though. My origin point and the max x and y positions from my cameras were used to estimate the space center in my case, and it worked. Not sure if this is correct but the absolute 3D error appears to be low.

leonhard-yu-zhang commented 1 year ago

Hi @gpastal24 . Thanks a lot.

Could you share your trained weights file for the panoptic dataset?

Have you already changed the space_size and space_center of capture_spec and individual_spec in jln64.yaml before training?

For the unit of T, I notice that in the _get_cam function: our_cam['T'] = -np.dot(v['R'].T, v['t']) * 10.0 . With the coefficient 10.0 I think that at the beginning the unit of T should be cms instead of mms?

Taylorminer commented 1 year ago

Thanks!Do you train the model on Campus dataset ? If yes, can you share it with me ? I want to test the model in a custom scene with 3 cameras, But there are 5 cameras in shelf dataset.

I have trained it on Campus as well, but I cannot access the files atm. This should not be an issue, just load the pretrained model from Shelf dataset and you will be fine. Actually I have tried cross dataset metrics, I believe in campus dataset the error increases by 20mm (from 70-90 to 90-110, if memory serves me correct). I could upload the campus weights later if you wish, but I dont think you actually need them for your test.

I have trained the model in Campus dataset. But the PCP is 0 during my training,like this: image I just change the numbers of the sample from 10k to 5k. I want to try your pretrained weight file, if it's convenient for you. Thanks! @gpastal24

gpastal24 commented 1 year ago

@Taylorminer here is the campus dataset, I think it is correct.

@leonhard-yu-zhang I don't think I can share them since I have not used my own machine to train the model on panoptic. Can't you train them yourself? I did not change the parameters in the config file during training though, i changed them for the inference test and yet the error was low, measured with a meter no gt 3D were available. The model would act weird if there were 2 persons in the scene though I suppose it has to do with the voxel per axis parameter which I didn't change. If you want to train the model on your configuration I would suggest to follow the authors' approach on the campus and shelf dataset tbh and train with the synthetic data with your own camera calibration file.

Taylorminer commented 1 year ago

@Taylorminer here is the campus dataset, I think it is correct.

@leonhard-yu-zhang I don't think I can share them since I have not used my own machine to train the model on panoptic. Can't you train them yourself? I did not change the parameters in the config file during training though, i changed them for the inference test and yet the error was low, measured with a meter no gt 3D were available. The model would act weird if there were 2 persons in the scene though I suppose it has to do with the voxel per axis parameter which I didn't change. If you want to train the model on your configuration I would suggest to follow the authors' approach on the campus and shelf dataset tbh and train with the synthetic data with your own camera calibration file.

Thanks for your reply ! I try to test the model on Campus dataset with your weights. But I meet this error: image Have you meet the same error ? @gpastal24

gpastal24 commented 1 year ago

Yes I have it is a bug, I believe you have to rename meta[0] to meta or something. I had to make changes to the save_debug func to work with the validate script. I think it is pretty straight forward thoughyou might have to convert some of the tensors to np arrays, I dont remember which ones

gpastal24 commented 1 year ago

@Taylorminer I believe these are the changes I had made to the code for the visualization to work on the validate script. final_poses was also a numpy array Screenshot from 2023-02-18 10-20-56 Screenshot from 2023-02-18 10-21-19

Taylorminer commented 1 year ago

@Taylorminer I believe these are the changes I had made to the code for the visualization to work on the validate script. final_poses was also a numpy array Screenshot from 2023-02-18 10-20-56 Screenshot from 2023-02-18 10-21-19

@gpastal24 It works ! Thanks very much ! But it does't save the sample grid in HDN according the terminal :

image And the visualized result seems a little strange. Almost every person is with two sets of bone in 3D pose : validation_00000012_2d

gpastal24 commented 1 year ago

@Taylorminer It does save the HDN, what you are seeing is a warning. The red skeletons are the gt, the colored are the predicted skeletons. The are some false positives in the results, indicated by the lack of a gt skeleton for some of the predicted ones

Ruid6 commented 1 year ago

Hello, I would like to ask a question. After following the instructions in the readme.md file to set up my environment, I ran the command 'python run/validate.py --cfg configs/campus/jln64.yaml' and received an error message indicating that the 'model_best.pth.tar' file is missing. Does this mean that the author did not provide the file and that I need to train the model myself?

Taylorminer commented 1 year ago

Hello, I would like to ask a question. After following the instructions in the readme.md file to set up my environment, I ran the command 'python run/validate.py --cfg configs/campus/jln64.yaml' and received an error message indicating that the 'model_best.pth.tar' file is missing. Does this mean that the author did not provide the file and that I need to train the model myself?

Yes, you should train the model before you validate

Taylorminer commented 1 year ago

@leonhard-yu-zhang Hi. I have run the panoptic model in real-time video capturing scenario. How is your camera calibrated. Do you obtain the Rw,Tw or Rc,Tc. In the second case Tc would be close to the camera position from your origin point. You need the Rw and Tw to test the model. In the _get_cam function you need to use the M matrix to get the dot product of your Rw with M. If your system is right handed y up, you should change the second row to 1 instead of -1. You then obtain the Tc by the -np.dot(R.T,Tw) operation in the same function. Note that T should be in mms. Lastly, the space center and space size parameters should be correctly assigned.

It is not clear to me why they use the campus parameters that they use on their config file, though. My origin point and the max x and y positions from my cameras were used to estimate the space center in my case, and it worked. Not sure if this is correct but the absolute 3D error appears to be low.

@gpastal24 ,have you run the campus/shelf model in a custom scenario? I try to test the model on custom image. I used my own camera parameters like the campus format. But I can't get the 3D pose.

gpastal24 commented 1 year ago

@Taylorminer hi. I have only tried the panoptic model. The campus/shelf models have to be used with another 2D pose estimation model. Then you have to create the input heatmaps as the authors do on these datasets and feed them into the model.

Taylorminer commented 1 year ago

@Taylorminer hi. I have only tried the panoptic model. The campus/shelf models have to be used with another 2D pose estimation model. Then you have to create the input heatmaps as the authors do on these datasets and feed them into the model.

I have used the 2D pose estimation model to get 2D pose. I user mmpose to get 2d keypoints. But https://github.com/AlvinYH/Faster-VoxelPose/blob/main/lib/models/cnns_2d.py line 177, the value of hm is strange like this: image I try to test the campus dataset. The value of hm like this: image How do you calibrate the camera? How do you get the 2D keypoints ?

gpastal24 commented 1 year ago

Shouldn't the hm size be (1,nc,160,200) or something similar though. nc the number of views . Plus It is not weird to have values so low since when you are generating the heatmaps, you are creating gaussian "pulses" around the keypoints. Pulses is not the correct term here but you get the idea.

The cameras have to be calibrated in such a way that -R^-1.dot(T) should give you the cam position in the 3D space. You may have to use M as well to change your axes , the -R^-1.dot(M).dot(T) should satisfy the aforementioned condition (i am not sure about the latter though you have to check their preprocessing to make sure)

FrankMMMMMM commented 1 year ago

image Hello. I have trained the model in CMU Panoptic dataset. Could you tell me how to fix it? Thanks. @gpastal24

gpastal24 commented 1 year ago

Hmm i had never faced this error and I don't know what's up. Total_gt appears to be zero though . You ll have to debug at the following lines though https://github.com/AlvinYH/Faster-VoxelPose/blob/4daaedad466b9c95b1e9b35cfabd496b60e6013a/lib/dataset/panoptic.py#L230-L255

FrankMMMMMM commented 1 year ago

Thank you for your reply. I tried to train the model on the campus dataset but got the following error. I tried to run it on RTX3090 and RTX2080TI and the error is the same. Have you faced this error? Can you tell me the solution? On what hardware device are you training? image @gpastal24 Thank you!

gpastal24 commented 1 year ago

@FrankMMMMMM "the bounding box is sufficiently large to cover poses !" That is not an error. You are getting killed because you are running out of RAM. As the code stands it loads 10k, if i remember correctly ,of heatmaps / annotations etc. into the memory. I had to change that part to load the heatmaps on demand. You can change the config to load less data as a first try though to make sure everything is fine. I have trained the campus/shelf datasets on colab with the requirements they provide. I managed to train the panoptic model on an RTX3080 but there had to be some changes on the training code. There is an issue on this repo where I believe I have said what I did.

FrankMMMMMM commented 1 year ago

@Taylorminer Hello! Did you successfully run your own customized video in the model? Can you tell me how to do it?

Taylorminer commented 1 year ago

@Taylorminer Hello! Did you successfully run your own customized video in the model? Can you tell me how to do it?

No,I didn't. There is no output when I test the model in a customized scene. I don't know the reason. The camera calibration and the model generalization may affect. If you can run successfully,please tell me.

912267428 commented 1 year ago

Hi , I have been trying to train the model. At first I tried the repo as is and I was getting this bug Screenshot from 2022-12-10 13-49-14

Then I removed the part that saves the imgs ,note that it works ok when evaluating, and I was getting killed cause of RAM when I had the default 10k samples. I reduced it to 1-5k and I managed to load them succesfully. If I understand correctly in the synthetic data experiments the whole dataset is loaded into RAM.

After that I was getting CUDA OOM, the culprit seemed to be

https://github.com/AlvinYH/Faster-VoxelPose/blob/4daaedad466b9c95b1e9b35cfabd496b60e6013a/lib/core/function.py#L69-L70

so I removed that part.

The model seemed to train at first but then I was NaN tensor and the accumulated loss was nan as well. During validation using the debug_save_imgs function , actually works and I can see the predicted 3d poses and their projections. The thing is when I print the final_fused_poses variable, each joint is always [0,0,0,-1..]. The error for each actor is stuck at 0.0.

I have tried training both in an RTX3080 (where I cant use torch 1.4.0 since cuda 11 is the minimum version supported by RTX30xx series) and on colab where I installed the requirements from the requirements.txt file. Hi, I have also encountered the same problem as in your picture. How did you solve it?Directly delete "save_debug_2d_images" function?

AlvinYH commented 1 year ago

Hi, @gpastal24, @leonhard-yu-zhang, @Ruid6, @912267428 , @FrankMMMMMM. Thanks for your interest in our work. We've modified the code and solved the aforementioned problems. You can pull the recent release. Please let me know if there are any problems.

gpastal24 commented 1 year ago

@AlvinYH Hello I get the following error, when I try to train the model. I am not sure if I was getting this error before.

RuntimeError: Error(s) in loading state_dict for ResNet: size mismatch for final_layer.weight: copying a param with shape torch.Size([18, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([15, 256, 1, 1]). size mismatch for final_layer.bias: copying a param with shape torch.Size([18]) from checkpoint, the shape in current model is torch.Size([15]).

AlvinYH commented 1 year ago

Hi, @gpastal24. Thank you for pointing out this problem! Actually we've modified the parameters of the pre-trained backbone model. You need to re-download it and the link is presented in the README now.

gpastal24 commented 1 year ago

Hi, @gpastal24. Thank you for pointing out this problem! Actually we've modified the parameters of the pre-trained backbone model. You need to re-download it and the link is presented in the README now.

Thank you @AlvinYH , that did the trick (as I was ready to manually load the final layers weights :P ). Now I have some problems with the panoptic dataset since it was already preprocessed for the previous version of FVP. I guess there are some changes in the preprocessing (maybe with annotation names or something?). I ll run the preprocess script here and I ll let you know if everything is fine

gpastal24 commented 1 year ago

@AlvinYH https://github.com/AlvinYH/Faster-VoxelPose/blob/bccdc4a872ba5960f714ffb1b26371ba7db7571f/lib/core/function.py#L38-L39

These lines were the culprit. Now I was getting CUDA RAM OOM with the new code as well. Maybe as the result of NaN Inf error found warning I have been getting (hence the accu loss keeps accumulating until it kills the process). For now I will just backprop the whole loss again , but what is the intuition behind the accumulation steps?

AlvinYH commented 1 year ago

Hi, @gpastal24. Thank you again! I'm sorry that I forgot to remove these lines after debugging. However, I didn't get NaN errors when I re-started training on the Panoptic dataset, and it worked fine on my GeForce RTX 2080 GPU. As for the accumulation steps, we simply follow VoxelPose in terms of training HDN and JLN alternately. The underlying intuition is that we pay more attention to JLN and optimize its parameters in each iteration, while HDN is only updated in every four steps. I think this empirical design will not cause significant changes to the performance.

gpastal24 commented 1 year ago

@AlvinYH Ok I see, truth being told the difference for the 5 cams was around 0.2 mms so it is not that significant indeed.

I have been getting another error during the evaluation of the model though. I also noticed that the assertion that the length of the db should be equal to the length of preds, was this change made by design? Length of preds is 2616 while db length is 2580. https://github.com/AlvinYH/Faster-VoxelPose/blob/7dcada49646578e1157d5deccb8889cea9437e84/lib/dataset/panoptic.py#L216-L217 error_eval

gpastal24 commented 1 year ago

Nevermind @AlvinYH I found the issue with the validation as well. Some poses are appended twice

https://github.com/AlvinYH/Faster-VoxelPose/blob/7dcada49646578e1157d5deccb8889cea9437e84/lib/core/function.py#L146-L162

AlvinYH commented 1 year ago

Hi, @gpastal24. Yes, you're right. That was another typo :(

gpastal24 commented 1 year ago

@AlvinYH my bad about the OOM. I run the preprocess script on already preprocessed images and the model would have abysmal results, thus almost never back propagating. I got another OOM before, but I got over the 200 batches so I think I am in the clear now . Thank you for making this amazing work available ! :)

AlvinYH commented 1 year ago

@gpastal24 Great! Glad to hear that the problem has been solved :) Now I'll close this issue.