Allow reading of large files which do not fit in memory

davidcaron / pye57

Read and write e57 point clouds from Python

MIT License

68 stars 42 forks source link

Allow reading of large files which do not fit in memory #45

Open NadimDeeb opened 11 months ago

NadimDeeb commented 11 months ago

I have a huge point cloud formed of one single scan containing 227 million points. So every time I use the .read_scan() function the code crashed and doesn't load the point cloud at all. I'm wondering if this library offers a way to read the x or y or z coordinates individually or even half a scan perhaps.

GeoffNordling commented 10 months ago

Searching for the same. How is one supposed to read an .e57 file that doesn't fit in memory?

swell-d commented 6 months ago

I have the same problem Increasing swap did not help

swell-d commented 6 months ago

The following solution worked for me:

use float32 instead of float64

SUPPORTED_CARTESIAN_POINT_FIELDS = {
"cartesianX": np.float32,  # was "d"
"cartesianY": np.float32,  # was "d"
"cartesianZ": np.float32,  # was "d"
}
SUPPORTED_SPHERICAL_POINT_FIELDS = {
"sphericalRange": np.float32,  # was "d"
"sphericalAzimuth": np.float32,  # was "d"
"sphericalElevation": np.float32,  # was "d"
}

--timeout 600 for gunicorn

dancergraham commented 6 months ago

It could be nice to add an optional keyword argument or similar for this. float 64 are often needed to avoid losing precision, e.g for pointclouds recorded in utm coordinates, but having an option for switching to float 32 would be great.

swell-d commented 6 months ago

having an option for switching to float 32 would be great

import numpy as np
from pye57 import e57

e57.SUPPORTED_CARTESIAN_POINT_FIELDS = {
    "cartesianX": np.float32,
    "cartesianY": np.float32,
    "cartesianZ": np.float32,
}

e57.SUPPORTED_SPHERICAL_POINT_FIELDS = {
    "sphericalRange": np.float32,
    "sphericalAzimuth": np.float32,
    "sphericalElevation": np.float32,
}

your code here

dancergraham commented 6 months ago

I'm not working on this right now but thinking about what it could look like - maybe passing a dictionary of data types would be neat, something like

data = e57.read_scan(0, dtypes = {
    "cartesianX": np.float32,
    "cartesianY": np.float32,
    "cartesianZ": np.float32,
    }
)

We also need to think about what would happen if the contained values were out of bounds for the given data type - my first idea would be to raise an exception, for instance an OverflowError.

Pull requests along these lines would be very welcome !

dancergraham commented 2 weeks ago

As mentioned in #74 the underlying libe57format library does not allow partial file reading / chunked reading so I don't see an easy way to add that option to pye57.