Custom data training - Githubissues

KiranAkadas commented 4 years ago

Hello @QingyongHu, Thanks for the great work, I've been working on it since a few days. I was testing the model on my data which has around 3-5 million points. I've finished testing, although when I try to run it in the test mode, I get fewer label outputs than the original number of points. Even the original_ply files have less number of points compared to my txt files input. What could be the reason?

QingyongHu commented 4 years ago

Hi @KiranAkadas, can you please give me more details about how you process the data? Do you follow the scripts in utils/data_prepare** to process your own data?

KiranAkadas commented 4 years ago

Yes, I've used the data_prepare_Semantic3D script to create the ply and trees for train and test data.

QingyongHu commented 4 years ago

For the Semantic3D dataset, we use grid sub_sampling to reduce the number of points, but we project back the sub_labels to raw point clouds during testing. Therefore, you would get the same number of points if you correctly project back the predicted labels. Alternatively, it is not necessary to use grid_sub_sampling in your custom dataset as your dataset is not extremely large.

I hope this would be helpful.

KiranAkadas commented 4 years ago

Thanks for the reply. Correct me if I'm wrong - The projection files are created from the sub_pc after grid subsampling with grid size 0.01 and the same are used to reproject back. Won't the test results have less number of points (i.e the number of points after subsampling with grid size 0.01) ?

QingyongHu commented 4 years ago

Yes, you are right. But at here, we reproject back the predicted probabilities to raw point clouds, so the final results should have the same numbers with the raw point clouds.

KiranAkadas commented 4 years ago

Thanks. I got the results

KiranAkadas commented 4 years ago

Hello @QingyongHu! I had a small problem. I initially trained my model with all 6 labels in the dataset which included the label 5: "unlabeled" category. It trained well but the test results were very biased and almost all points were categorised as label 5. Next I tried training by ignoring the label 5 by including the ignored label to be 5. Now, the training is somewhat weird. I get an accuracy of 1.00 but the best mIoU, I'm getting 0.062 after 40 epochs. Can you please let me know if I went wrong somewhere or I had to include/exclude something?

abhigoku10 commented 4 years ago

@KiranAkadas hi are you getting any error when you reduce the number of the classes and trainfrom the pipeline

KiranAkadas commented 4 years ago

@abhigoku10 Got a few errors initially, had to debug those. Were related to input shape

abhigoku10 commented 4 years ago

@KiranAkadas i am getting Nan error i have shared it in reference #10 can you plese look into it and say if its the same error you also have got

bin70 commented 4 years ago

Hello @QingyongHu! I had a small problem. I initially trained my model with all 6 labels in the dataset which included the label 5: "unlabeled" category. It trained well but the test results were very biased and almost all points were categorised as label 5. Next I tried training by ignoring the label 5 by including the ignored label to be 5. Now, the training is somewhat weird. I get an accuracy of 1.00 but the best mIoU, I'm getting 0.062 after 40 epochs. Can you please let me know if I went wrong somewhere or I had to include/exclude something?

Hello! I get the similar training state with yours that the accuracy keeps 1.00 but mIoU is only 6.9 when I training on my own dataset. Could you please share your solution of it? Thanks a lot!

JerryIndus commented 3 years ago

Hello @QingyongHu! I had a small problem. I initially trained my model with all 6 labels in the dataset which included the label 5: "unlabeled" category. It trained well but the test results were very biased and almost all points were categorised as label 5. Next I tried training by ignoring the label 5 by including the ignored label to be 5. Now, the training is somewhat weird. I get an accuracy of 1.00 but the best mIoU, I'm getting 0.062 after 40 epochs. Can you please let me know if I went wrong somewhere or I had to include/exclude something?

Hi, have you already solved this problem? I met a similar problem, but don't know how to deal with it.

JerryIndus commented 3 years ago

For the Semantic3D dataset, we use grid sub_sampling to reduce the number of points, but we project back the sub_labels to raw point clouds during testing. Therefore, you would get the same number of points if you correctly project back the predicted labels. Alternatively, it is not necessary to use grid_sub_sampling in your custom dataset as your dataset is not extremely large.

I hope this would be helpful.

Hello, you say "it is not necessary to use grid_sub_sampling in your custom dataset as your dataset is not extremely large." but if we don't use grid_sub_sampling, what sub_grid_size in helper_tool.py should be set?

ZhenghaoSun commented 3 years ago

For the Semantic3D dataset, we use grid sub_sampling to reduce the number of points, but we project back the sub_labels to raw point clouds during testing. Therefore, you would get the same number of points if you correctly project back the predicted labels. Alternatively, it is not necessary to use grid_sub_sampling in your custom dataset as your dataset is not extremely large. I hope this would be helpful.

Hello, you say "it is not necessary to use grid_sub_sampling in your custom dataset as your dataset is not extremely large." but if we don't use grid_sub_sampling, what sub_grid_size in helper_tool.py should be set?

Hi, did u figure it out? I'm also wondering.

JiahaoXia commented 2 years ago

Thanks for the reply. Correct me if I'm wrong - The projection files are created from the sub_pc after grid subsampling with grid size 0.01 and the same are used to reproject back. Won't the test results have less number of points (i.e the number of points after subsampling with grid size 0.01) ?

@QingyongHu Based on main_Semantic3D.py, the original point cloud is subsampled twice, first 0.01 and then grid_size, the script reprojected back to the point cloud after 0.01 grid_sub_sampling. What size point cloud requires the grid_sub_sampling process?

#  Subsample to save space
sub_points, sub_colors, sub_labels = DP.grid_sub_sampling(pc[:, :3].astype(np.float32),
                                                          pc[:, 4:7].astype(np.uint8), labels, 0.01)
sub_labels = np.squeeze(sub_labels)

write_ply(full_ply_path, (sub_points, sub_colors, sub_labels), ['x', 'y', 'z', 'red', 'green', 'blue', 'class'])

# save sub_cloud and KDTree file
sub_xyz, sub_colors, sub_labels = DP.grid_sub_sampling(sub_points, sub_colors, sub_labels, grid_size)
sub_colors = sub_colors / 255.0
sub_labels = np.squeeze(sub_labels)
sub_ply_file = join(sub_pc_folder, file_name + '.ply')
write_ply(sub_ply_file, [sub_xyz, sub_colors, sub_labels], ['x', 'y', 'z', 'red', 'green', 'blue', 'class'])

search_tree = KDTree(sub_xyz, leaf_size=50)
kd_tree_file = join(sub_pc_folder, file_name + '_KDTree.pkl')
with open(kd_tree_file, 'wb') as f:
    pickle.dump(search_tree, f)

proj_idx = np.squeeze(search_tree.query(sub_points, return_distance=False))
proj_idx = proj_idx.astype(np.int32)
proj_save = join(sub_pc_folder, file_name + '_proj.pkl')
with open(proj_save, 'wb') as f:
    pickle.dump([proj_idx, labels], f)

QingyongHu / RandLA-Net

Custom data training #24