TRI-ML / dgp

ML Dataset Governance Policy for Autonomous Vehicle Datasets
https://tri-ml.github.io/dgp/
MIT License
94 stars 62 forks source link

Add support for PLY files #145

Closed DavidTorresOcana closed 11 months ago

DavidTorresOcana commented 1 year ago

This change is Reviewable

Currently, lidar related data is stored as Numpy files

This file format, while simple enough, is not compatible with other software like Blender, Meshlab, etc. Also, performance could be improved.

In this Pull Request, PLY file format support is proposed.

This feature, while making lidar data more compatible with other software programs, also makes DGP about x10 times faster to read/write pointclouds from/to disk, at the expense of just ~10% increase in size.

Performance comparison

import os

import numpy as np
from dgp.utils.dataset_conversion import _write_point_cloud, read_cloud_ply, write_cloud_ply

output[0][-1]["point_cloud"].shape

(133632, 3)

DGP native

%%timeit
_write_point_cloud("./test", output[0][-1]["point_cloud"])

114 ms ± 1e+03 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
np.load("test.npz")['data']

13.2 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

os.stat("test.npz").st_size 

2936480

PLY support

%%timeit
write_cloud_ply("./test.ply", output[0][-1]["point_cloud"])

11.9 ms ± 208 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
read_cloud_ply("test.ply")

1.09 ms ± 6.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

os.stat("test.ply").st_size 

3207291

@rachel-lien @akira-wakatsuki-woven ping

DavidTorresOcana commented 1 year ago

dgp/utils/dataset_conversion.py line 47 at r1 (raw file):

Previously, rachel-lien wrote…
Could you add a unit test for this function?

Done.

DavidTorresOcana commented 1 year ago

dgp/utils/dataset_conversion.py line 81 at r1 (raw file):

Previously, rachel-lien wrote…
Could you add a unit test for this function?

Done.

DavidTorresOcana commented 1 year ago

dgp/utils/dataset_conversion.py line 116 at r2 (raw file):

Previously, rachel-lien wrote…
Is there a particular reason 20 is chosen as the cutoff for the loop?

It is for safety. But agree that it is not needed. Removed

DavidTorresOcana commented 1 year ago

dgp/utils/dataset_conversion.py line 121 at r2 (raw file):

Previously, rachel-lien wrote…
Out of curiosity, what is `nx` (it's not in ply files generated with `write_point_cloud`)?

The implemented functionality allows encoding of intensities and timestamps onto the PLY file. If using Open3D to save PLYs, there is no option for timestamps, whereas there is option for color (RGB). Hence, in the past, I used 1st component of vertices' normals to embed time (nx). As this legacy code and is not required here, I removed it. Apologies and, if you want the feature back, please let me know

DavidTorresOcana commented 1 year ago

save ok, thanks for that!

DavidTorresOcana commented 1 year ago

dgp/utils/dataset_conversion.py line 121 at r2 (raw file):

Previously, rachel-lien wrote…
I see, thank you for the explanation! I think it's fine to leave off >If using Open3D to save PLYs, there is no option for timestamps, where as there is option for color (RGB). Hence, in the past, I used 1st component of vertices' normals to embed time (nx). Not required, but maybe one workaround for this issue is to let the user specify a custom properties mapping for instances where the ply file is generated outside of dgp. This lets a user call `read_cloud_ply` with the custom mapping and then `write_cloud_ply` to get a new ply file with the expected default dgp property names Ex: ``` def read_cloud_ply(file, remap_properties=None): properties_map = { # property_name: ply_property_name 'x': 'x', 'y': 'y', 'z': 'z', 'intensity': 'intensity', 'timestamp': 'timestamp', } if remap_properties: properties_map.update(remap_properties) ..... intensities_idx = [properties.index(k) for k in [properties_map["intensity"]] if k in properties] ..... read_cloud_ply(open3d_ply_file, remap_properties = {'intensity': 'nx'}) ```

Hello. I do not think that is a good idea. Specially due to types. The case I explained was actually a misuse of PLY format, so it is a good thing our reader does not handle that.

That being said, 2 comments:

DavidTorresOcana commented 1 year ago
:lgtm:
DavidTorresOcana commented 1 year ago

@yuta-tsuzuki-woven Is there anything else we should improve in this PR to be merged?

Regards

DavidTorresOcana commented 1 year ago

ok 2 approvals, then ready for merge, no? Is there anything I need to do? I am not used to Reviewable

DavidTorresOcana commented 1 year ago

@DavidTorresOcana I think it's just rebase and pass the pre-merge tests. Seems the linter is failing https://github.com/TRI-ML/dgp/actions/runs/4666965545/jobs/8262166566?pr=145

You can enable the linter to autoformat before pushing a commit by following these steps https://github.com/TRI-ML/dgp/actions/runs/4666965545/jobs/8262166566?pr=145

_Reviewable_ status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @DavidTorresOcana)

I was never able to run that. This is what I get when running that, both inside and outside Docker container:

dgp$ make link-githooks
make: *** No rule to make target 'link-githooks'.  Stop.

Could it be that target link-githooks no longer exists and now it is called setup-linters?

I used Flake8 to check format of files I modified. Format of my changes seems ok to me.

I am not sure how to make CI pass after commits were made....

DavidTorresOcana commented 1 year ago

@yuta-tsuzuki-woven Is it ok to merge then?

DavidTorresOcana commented 11 months ago

Anything else to do @rachel-lien @yuta-tsuzuki-woven ?