daavoo / pyntcloud

pyntcloud is a Python library for working with 3D point clouds.
http://pyntcloud.readthedocs.io
MIT License
1.39k stars 221 forks source link

perf: read_ply replace pandas.read_csv engine=python with c; improve read_off header-parsing robustness #352

Open YodaEmbedding opened 10 months ago

YodaEmbedding commented 10 months ago

UPDATE: I have rebased this PR on top of the latest commit. The revised changes are:


In particular, ModelNet40 has faulty headers:

$ head -n 1 ModelNet40/chair/train/chair_0856.off
OFF6586 5534 0

For reference, the correct format is:

OFF
6586 5534 0

Nonetheless, it is still valuable to parse the faulty header.


(Original text before #353 was merged) Big performance improvement by removing the need to use the slow `engine="python"` by reading the sliced file from an in-memory StringIO buffer. Also fixes bug where OFF files containing more lines than `num_points + num_faces` tries to read potential edges as faces! As [Wikipedia] says, the OFF file may contain: - points - faces (optional) - edges (optional) Of course, this still does not encompass all possible OFF file variants described by Wikipedia, but it's an improvement.
YodaEmbedding commented 6 months ago

Both this PR and #353 improved pandas performance for *.OFF files with engine=c. Therefore, I rebased this PR on top of #353. This PR still contains some other useful changes, listed above.


Future work:

Once this is reviewed/accepted, I can look into improving compatibility with Wikipedia's description of the [*.OFF file format][OFF]. Of course, perfect compatibility is too slow, but there's still some missing features: - "C" in the header should not be needed to detect the presence of color (see Wikipedia's example). - Edges, and edge colors.