UPDATE: I have rebased this PR on top of the latest commit. The revised changes are:
perf: Speed up reading of ASCII PLY files.
feat: improve robustness for OFF headers on e.g. ModelNet40
perf: reuse already open file for reading instead of opening it twice
style: renamed variables for clarity (e.g. color -> has_color; and count -> n_header)
In particular, ModelNet40 has faulty headers:
$ head -n 1 ModelNet40/chair/train/chair_0856.off
OFF6586 5534 0
For reference, the correct format is:
OFF
6586 5534 0
Nonetheless, it is still valuable to parse the faulty header.
(Original text before #353 was merged)
Big performance improvement by removing the need to use the slow `engine="python"` by reading the sliced file from an in-memory StringIO buffer.
Also fixes bug where OFF files containing more lines than `num_points + num_faces` tries to read potential edges as faces!
As [Wikipedia] says, the OFF file may contain:
- points
- faces (optional)
- edges (optional)
Of course, this still does not encompass all possible OFF file variants described by Wikipedia, but it's an improvement.
Both this PR and #353 improved pandas performance for *.OFF files with engine=c. Therefore, I rebased this PR on top of #353. This PR still contains some other useful changes, listed above.
Future work:
Once this is reviewed/accepted, I can look into improving compatibility with Wikipedia's description of the [*.OFF file format][OFF]. Of course, perfect compatibility is too slow, but there's still some missing features:
- "C" in the header should not be needed to detect the presence of color (see Wikipedia's example).
- Edges, and edge colors.
UPDATE: I have rebased this PR on top of the latest commit. The revised changes are:
color
->has_color
; andcount
->n_header
)In particular, ModelNet40 has faulty headers:
For reference, the correct format is:
Nonetheless, it is still valuable to parse the faulty header.
(Original text before #353 was merged)
Big performance improvement by removing the need to use the slow `engine="python"` by reading the sliced file from an in-memory StringIO buffer. Also fixes bug where OFF files containing more lines than `num_points + num_faces` tries to read potential edges as faces! As [Wikipedia] says, the OFF file may contain: - points - faces (optional) - edges (optional) Of course, this still does not encompass all possible OFF file variants described by Wikipedia, but it's an improvement.