georust / gdal

Rust bindings for GDAL
https://crates.io/crates/gdal
MIT License
364 stars 94 forks source link

concurrent read/write support #102

Closed geohardtke closed 4 years ago

geohardtke commented 4 years ago

Hi, I was wondering if you had plans to implement concurrent reading/writing for raster datasets. As far as I can understand from the documentation, that should be achieved by using GDALDatasetHGDALOpenShared(const char*, GDALAccess) instead of GDALDatasetHGDALOpen(const char*, GDALAccess) . See https://gdal.org/api/raster_c_api.html#_CPPv414GDALOpenSharedPKc10GDALAccess

Thanks in advance.

rmanoka commented 4 years ago

@geohardtke Could you elaborate more on "concurrent r/w" ? For instance, with multiple threads, OpenShared has no effect, and is equivalent to Open (as mentioned int he doc link you've mentioned above). See quoted text below.

Starting with GDAL 1.6.0, if GDALOpenShared() is called on the same pszFilename from two different threads, a different GDALDataset object will be returned as it is not safe to use the same dataset from different threads, unless the user does explicitly use mutexes in its code.

Also, in this crate, we allow Sending a dataset handle to another thread, so to my understanding, OpenShared is not compatible with our usage.

geohardtke commented 4 years ago

@rmanoka Thank you very much for your answer, I'm glad to hear that what i need is possible in rust, I guess I'm still new to rust and have a lot to learn. I often deal with raster inputs/outputs that are much larger than memory and I'd like to be able to process the inputs blockwise in parallel and write the results to a compressed output raster. I've been doing that using rasterio in python (as shown here: https://rasterio.readthedocs.io/en/latest/topics/concurrency.html) but would really like to understand how to implement somethng simillar in rust. If it is not too much asking, would you mind having a look to my code snipet?

This is what I have so far:

fn main() {
    let yaml = load_yaml!("cli.yaml");
    let m = App::from(yaml).get_matches();
    let name  = m.value_of("name")
        .expect("This can't be None, we said it was required");
    let path = Path::new(name);
    let dataset = Dataset::open(path).unwrap();
    let ds = Arc::new(Mutex::new(dataset));
    for i in 1..3 {
        let ds = Arc::clone(&ds);
        thread::spawn(move || {
            let mut ds = ds.lock().unwrap();
            println!("dataset description: {:?}", ds.description());
            let rasterband: RasterBand = ds.rasterband(1).unwrap();
            let rv = rasterband.read_as::<u8>((0, 0), (256,256),(256, 256)).unwrap();

        });
    };

And the error:

|         thread::spawn(move || {
|         ^^^^^^^^^^^^^ `*mut std::ffi::c_void` cannot be shared between threads  safely

thanks again! cheers.

rmanoka commented 4 years ago

@geohardtke The GDAL library doesn't let you read using the same handle from different threads. You should open the same dataset in each of the threads. For chunking, I would suggest phrasing your processing logic as a "map-reduce" pattern, and using a library such as rayon. Please see this gist for a sample.

geohardtke commented 4 years ago

Excellen! Thank you very much for your time @rmanoka !