charlesq34 / pointnet

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
Other
4.73k stars 1.45k forks source link

How to prepare my own data set for the segmentation training? #75

Open ypLincode opened 6 years ago

ypLincode commented 6 years ago

Thank you for your sharing about the deep learning on point cloud. Is there any manual on how to prepare my own data for training?

charlesq34 commented 6 years ago

You are welcomed to refer here: https://github.com/charlesq34/3dmodel_feature/blob/master/io/write_hdf5.py

for how to prepare HDF5 files.

Best, Charles

ghost commented 6 years ago

Hi @charlesq34

Thank you for sharing the above link. I just have a quick question: Let say we have n number of h5 files in the training directory. As long as we provided proper paths to those files in the train_files.txt file and the name of the classes in the shape_names.txt file, the code should work as expected, right?

Edit: I was just going through the train and test h5 files downloaded during training: The data has been organized in a 2048x2048x3 grid with each row representing data specific to a class(label). Do we also need to organize our data in a similar data structure for the code to work in an optimal way?

charlesq34 commented 6 years ago

Hi @karenachiketc

If it is a classification problem: yes, but you also need to change the model definition file for size of the output layer, and train.py for the num_classes

You can put whatever number of point clouds in each .h5 file. In my example, I choose to put 2048 point clouds where each point cloud is of size 2048x3.

Hope it is clear now.

Best, Charles

ypLincode commented 6 years ago

@charlesq34 Thank you!

ghost commented 6 years ago

Hi @charlesq34 Yes, Thank you for your response. Yes I'm working on a classification problem. Although I understand what you have said here, this is my first time working with point cloud data and h5 files. I just have a follow up question just to know if I've understood this correctly

I have downloaded the ModelNet40 dataset(since the paper uses the same) and just to better understand the pre-processing step(writing data to the h5 file), I tried to create h5 file for the dataset. Do we need to follow the following steps:

  1. mesh sampling to overly sample more than 10k points
  2. Use farthest point sampling to get 2048 points
  3. Write these 2048 point to the h5 file
  4. Use the generated h5 files for training the model

Is that correct? Please let me know.

charlesq34 commented 6 years ago

Hi @karenachiketc

The steps sound good to me. Just make sure you visualize the point clouds before you start training, to make sure they are correctly prepared. You can either use MATLAB, python's library mayavi or a handy script here.

ghost commented 6 years ago

@charlesq34 Thank you! I'll visualize the generated point cloud using the provided script before I start the training.

ghost commented 6 years ago

hi @charlesq34

I've compiled the PCL library as per the instructions provided on their GitHub page but when I try to pass the *.off(ModelNet40 data file) as an input to the mesh sampling code, it for some reason is not able to recognize the file type. The code treats the .off file as a script and is unable to process it.

Any experience with this problem? Was the modelnet40 dataset you used in a different file format? If not then do we need to convert it to some other format before using the mesh sampling code?

abhishek-v commented 6 years ago

Hey @karenachiketc , perhaps you could try converting the OFF file to PLY format and run PCL's mesh sampling code. I observed OFF files have the following format:

OFF
<Number of vertices><Number of faces>
<Vertex list>
<Face list>

You can convert this into PLY format by editing the OFF file and adding the following header:

ply
format ascii 1.0
element vertex <Number of vertices>
property float32 x
property float32 y
property float32 z
element face <Number of faces>
property list uint8 int32 vertex_indices
end_header

This is followed by the <Vertex list> and <Face list>

Now run PCL's sampling command and you'll get a sampled point cloud file.

ghost commented 6 years ago

Hi @abhishek-v

Thank you for your quick response. Will try this out.

Edit: So basically read the .off file, get the data in a numpy array and then write it to the PLY format using the above header. Is that correct?

abhishek-v commented 6 years ago

@karenachiketc Yes, that's pretty much the idea. As PCL's mesh sampling code does not handle OFF files, you convert it into PLY format and run the code.

ghost commented 6 years ago

Nice! Thanks @abhishek-v

evagap commented 6 years ago

Hi @charlesq34,

Is it possible to train Pointnet using directly point cloud data?

Thank you

ghost commented 6 years ago

Hi @charlesq34

I have a question. How do you use the farthest point sampling code? These are the steps that I've followed:

  1. copy all the files from the pointnet2/tf_ops/sampling/ directory into my directory
  2. run the bash script tf_sampling_compile.sh to generate the object file: tf_sampling_so.so
  3. pass 2048 and your data read from the PCD file(as a numpy array. This file is obtained after extracting 10000 points from the PLY file) to the farthest_point_sampling function in the tf_sampling.py file. The function returns the data to be written to the H5 file.

Is the flow correct? I'm getting a weird segmentation error when I try to run my code. I think the way in which I'm passing data to the farthest_point_samping function may be the reasons but I'm not very sure. I don't have any experience with CPP.

BangyangJin commented 5 years ago

hi @charlesq34

I've compiled the PCL library as per the instructions provided on their GitHub page but when I try to pass the *.off(ModelNet40 data file) as an input to the mesh sampling code, it for some reason is not able to recognize the file type. The code treats the .off file as a script and is unable to process it.

Any experience with this problem? Was the modelnet40 dataset you used in a different file format? If not then do we need to convert it to some other format before using the mesh sampling code?

hi,what's the mesh sampling code? can you give a link? best wishs!!!

tanismar commented 5 years ago

@BangyangJin : the mesh sampling code refers to the pcl_mesh_sampling function, which allows to uniformly sample a mesh to the desired number of points: https://github.com/PointCloudLibrary/pcl/blob/master/tools/mesh_sampling.cpp

ParekhVivek09 commented 5 years ago

Hi @charlesq34 Yes, Thank you for your response. Yes I'm working on a classification problem. Although I understand what you have said here, this is my first time working with point cloud data and h5 files. I just have a follow up question just to know if I've understood this correctly

I have downloaded the ModelNet40 dataset(since the paper uses the same) and just to better understand the pre-processing step(writing data to the h5 file), I tried to create h5 file for the dataset. Do we need to follow the following steps:

1. mesh sampling to overly sample more than 10k points

2. Use farthest point sampling to get 2048 points

3. Write these 2048 point to the h5 file

4. Use the generated h5 files for training the model

Is that correct? Please let me know.

Can you please give more detail how you followed all thease 5 steps you mentioned. As I am new to this project. Thanks in advance

Yashu94 commented 5 years ago

You are welcomed to refer here: https://github.com/charlesq34/3dmodel_feature/blob/master/io/write_hdf5.py

for how to prepare HDF5 files.

Best, Charles

  1. mesh sampling to overly sample more than 10k points
  1. Use farthest point sampling to get 2048 points

  2. Write these 2048 point to the h5 file

  3. Use the generated h5 files for training the model

Can you please provide a detailed information regarding the above mentioned preprocessing steps. Thank you.

ShiQiu0419 commented 5 years ago

You are welcomed to refer here: https://github.com/charlesq34/3dmodel_feature/blob/master/io/write_hdf5.py for how to prepare HDF5 files. Best, Charles

  1. mesh sampling to overly sample more than 10k points
  2. Use farthest point sampling to get 2048 points
  3. Write these 2048 point to the h5 file
  4. Use the generated h5 files for training the model

Can you please provide a detailed information regarding the above mentioned preprocessing steps. Thank you.

For my simple implementation:

  1. For a mesh, do surface sampling algorithms on it to generate pointcloud with more points, e.g. Poisson-disk sampling, uniform surface sampling (codes decribed in pointnet++ project), or other algorithms implemented by yourself or some relevant libraries. Here I just simply use meshlabserver and .mlx script to call poisson-disk sampling filter in meshlab
  2. write greedy FPS algorithm to downsample it to 2048 points,
  3. repeat 1and2 to generate pointclouds for all meshes
  4. write and save the data in .h5 format, (refering _saveh5 function, https://github.com/charlesq34/pointnet/blob/master/utils/data_prep_util.py)
  5. load new h5 data
ShiQiu0419 commented 5 years ago

Hi @charlesq34 Yes, Thank you for your response. Yes I'm working on a classification problem. Although I understand what you have said here, this is my first time working with point cloud data and h5 files. I just have a follow up question just to know if I've understood this correctly I have downloaded the ModelNet40 dataset(since the paper uses the same) and just to better understand the pre-processing step(writing data to the h5 file), I tried to create h5 file for the dataset. Do we need to follow the following steps:

1. mesh sampling to overly sample more than 10k points

2. Use farthest point sampling to get 2048 points

3. Write these 2048 point to the h5 file

4. Use the generated h5 files for training the model

Is that correct? Please let me know.

Can you please give more detail how you followed all thease 5 steps you mentioned. As I am new to this project. Thanks in advance

please see my last reply for some hints, thanks

SalaheddineSTA commented 5 years ago

Hello, I'm trying to use my own data and i had a little problem. so basically i have a .pcd and followed the following pipeline :

  1. Downsampling my point cloud to 2048 points using PCL (http://pointclouds.org/documentation/tutorials/voxel_grid.php).
  2. Converting .pcd file to .ply using PCL (pcl_pcd2ply)
  3. Converting .ply file to .mat using matlab

and then i used the script write_hdf5.py to prepare my own data and i had this error.

image

Please any help, thank you.

LeeC20 commented 5 years ago

You are welcomed to refer here: https://github.com/charlesq34/3dmodel_feature/blob/master/io/write_hdf5.py for how to prepare HDF5 files. Best, Charles

  1. mesh sampling to overly sample more than 10k points
  2. Use farthest point sampling to get 2048 points
  3. Write these 2048 point to the h5 file
  4. Use the generated h5 files for training the model

Can you please provide a detailed information regarding the above mentioned preprocessing steps. Thank you.

For my simple implementation:

  1. For a mesh, do surface sampling algorithms on it to generate pointcloud with more points, e.g. Poisson-disk sampling, uniform surface sampling (codes decribed in pointnet++ project), or other algorithms implemented by yourself or some relevant libraries. Here I just simply use meshlabserver and .mlx script to call poisson-disk sampling filter in meshlab
  2. write greedy FPS algorithm to downsample it to 2048 points,
  3. repeat 1and2 to generate pointclouds for all meshes
  4. write and save the data in .h5 format, (refering _saveh5 function, https://github.com/charlesq34/pointnet/blob/master/utils/data_prep_util.py)
  5. load new h5 data

hi, what's the greedy FPS algorithm code? can you give a link or some hints? best wishes!!! thanks!!!!!

ShiQiu0419 commented 5 years ago

You are welcomed to refer here: https://github.com/charlesq34/3dmodel_feature/blob/master/io/write_hdf5.py for how to prepare HDF5 files. Best, Charles

  1. mesh sampling to overly sample more than 10k points
  2. Use farthest point sampling to get 2048 points
  3. Write these 2048 point to the h5 file
  4. Use the generated h5 files for training the model

Can you please provide a detailed information regarding the above mentioned preprocessing steps. Thank you.

For my simple implementation:

  1. For a mesh, do surface sampling algorithms on it to generate pointcloud with more points, e.g. Poisson-disk sampling, uniform surface sampling (codes decribed in pointnet++ project), or other algorithms implemented by yourself or some relevant libraries. Here I just simply use meshlabserver and .mlx script to call poisson-disk sampling filter in meshlab
  2. write greedy FPS algorithm to downsample it to 2048 points,
  3. repeat 1and2 to generate pointclouds for all meshes
  4. write and save the data in .h5 format, (refering _saveh5 function, https://github.com/charlesq34/pointnet/blob/master/utils/data_prep_util.py)
  5. load new h5 data

hi, what's the greedy FPS algorithm code? can you give a link or some hints? best wishes!!! thanks!!!!!

  1. Randomly choose a point from the original set as the start point of your new subset.
  2. Delete this from the original set, and calculate the distances between each point of the original set and your new subset. Note that the distances defined as the minimum distance between this point and its nearest correspondence in your new subset. Find the point with maximum distance, add it as the new point to your subset, also remember to delete it from the original set.
  3. Repeat the procedure until you get the expected number of points in your subset. (if the words confuse you, please refer to the equation in the picture) https://github.com/charlesq34/pointnet2/issues/45#issuecomment-471526324
LeeC20 commented 5 years ago

@ShiQiu0419 Thanks!!!!!!

AmmonZ commented 4 years ago

@ShiQiu0419

https://github.com/charlesq34/pointnet/issues/75#issuecomment-493845945

I have a question, why do you generate more points and then down sampling to 2048. Instead of directly sampling 2048 points from source point cloud. What is the motivation here?

sachin-n-AI commented 4 years ago

More the number of points higher will be the computation cost. It will take more time for an epoch if you increase the number of points in that point cloud. 2048 is his default number,you can change whatever you want.

AmmonZ commented 4 years ago

More the number of points higher will be the computation cost. It will take more time for an epoch if you increase the number of points in that point cloud. 2048 is his default number,you can change whatever you want.

I mean in the data preprocessing phase. Why did @ShiQiu0419 choose to increase more points before sampling into 2048, instead of directly sampling into 2048 points. In this phase, we haven’t feed the data into the network.

ShiQiu0419 commented 4 years ago

Hi,

Yes of course you can directly sample it with certain number of points. But if you have more sampled points, you would also use FPS to select any fixed number of points as you like.

In my case, firstly I randomly apply a surface sampling algorithm using meshlab but have no idea about how many samples it may collect. Then I use FPS to select a fixed number (2048) as I wish. The main reason is that I am not quite familiar with such surface sampling algorithm on mesh.

Cheers, Shi

------------------ Original ------------------ From: zymivan <notifications@github.com> Date: Fri,Oct 11,2019 9:33 PM To: charlesq34/pointnet <pointnet@noreply.github.com> Cc: ShiQiu0419 <philqiu@foxmail.com>, Mention <mention@noreply.github.com> Subject: Re: [charlesq34/pointnet] How to prepare my own data set for the segmentation training? (#75)

75 (comment)

I mean in the data preprocessing phase. Why did @ShiQiu0419 choose to increase more points before sampling into 2048, instead of directly sampling into 2048 points. In this phase, we haven’t feed the data into the network.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

zubair1502 commented 4 years ago

Hi @ShiQiu0419

I'm following the same steps and increased the number of points in each pointcloud to 10k by using PCL mesh sampling. But I'm facing the problem with FPS algorithm. I have already compiled the bash files successfully from tf_ops but when I'm running the tf_sampling.py it is just making the .pkl file that contains the binaries. I don't how to apply it correctly to get sampled pointcloud.

Please any help. Thank you.

ShiQiu0419 commented 4 years ago

tf_sampling operates on tensors while creating pointclouds only needs numpy processing. They are two different stories. If you want to make new datasets, you should write FPS based on numpy.

------------------ Original ------------------ From: zubair1502 <notifications@github.com> Date: Wed,Jan 29,2020 0:36 AM To: charlesq34/pointnet <pointnet@noreply.github.com> Cc: ShiQiu0419 <philqiu@foxmail.com>, Mention <mention@noreply.github.com> Subject: Re: [charlesq34/pointnet] How to prepare my own data set for the segmentation training? (#75)

Hi @ShiQiu0419

I'm following the same steps and increased the number of points in each pointcloud to 10k by using PCL mesh sampling. But I'm facing the problem with FPS algorithm. I have already compiled the bash files successfully from tf_ops but when I'm running the tf_sampling.py it is just making the .pkl file that contains the binaries. I don't how to apply it correctly to get sampled pointcloud.

Please any help. Thank you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

piyushsingh2k7 commented 2 years ago

@charlesq34 - Sir I have stored the point cloud in the .csv file along with label information, I want to feed .csv file directly directly to the pointnet , Is it possible to do it ?

quddusbusari commented 1 year ago

You are welcomed to refer here: https://github.com/charlesq34/3dmodel_feature/blob/master/io/write_hdf5.py

for how to prepare HDF5 files.

Best, Charles

Hi Charles,

I am going through this script, but I have some questions. In your case, what is volume_size, pad_size and vox_size. Also, how did you come about the data_dim definition? In my case I have a dataset prepared in the same way as the Modelnet_normal_resampled data. Each point cloud is in txt format with dimension of 2048 * 3, with 3 being the xyz coordinates. Will pad size be equal to 3 (i.e. xyz coordinates) in this case, and volume_size being 2048 (number of points in each point cloud)? Then what will data_dim be? Is there any need for the vox_size calculation? I don't understand what that is.. Kindly help clarify.

Thanks.