georust / gdal

Rust bindings for GDAL
https://crates.io/crates/gdal
MIT License
339 stars 92 forks source link

[Feature Request] Guess the Driver Type based on the file extension #506

Closed Atreyagaurav closed 5 months ago

Atreyagaurav commented 6 months ago

Currently, you can open a dataset with just Dataset::open("filename.ext") without giving the driver, but we cannot create a file without driver. If we can, please let me know.

My attempt to do Dataset::open on a new file has failed, even when using Dataset::open_ex like below didn't work:

Dataset::open_ex(
                filename,
                DatasetOptions {
                    open_flags: gdal::GdalOpenFlags::GDAL_OF_UPDATE
                        .union(gdal::GdalOpenFlags::GDAL_OF_VECTOR),
                    ..Default::default()
                },
            )

I want to be able to output to any gis file format. GPKG, shp, json, and so on.

So far I only know how to make a new file like this:

let driver = DriverManager::get_driver_by_name(driver)?;
let mut out_data = driver.create_vector_only(filename)?;

Since we need to pass filename as well as the driver, it feels redundant, or extra work for user, where they might make mistakes.

So is there a way to add DriverManager::get_driver_by_extension, that can guess the driver based on the file exntension?

Considering the Dataset::open doesn't need driver I thought maybe there is already a function, but it seems to call C functions and take some valid drivers list, so I wasn't able to find how to replicate that for writing new files.

lnicola commented 6 months ago

The GDAL tools use these two internal functions. I've thought before about implementing something similar, so if you want to file a PR, it's going to be appreciated.

Atreyagaurav commented 6 months ago

Thank you. I'll give it a try. I've never used C API from rust before. But other functions could give some examples.

On Sun, Dec 24, 2023, 15:14 Laurențiu Nicola @.***> wrote:

GDAL uses these two https://github.com/OSGeo/gdal/blob/b338831ad7f51cc26f7ef23632f0da7f2fd554e5/apps/commonutils.cpp#L65C1-L178. I've thought before about implementing something similar, so if you want to file a PR, it's going to be appreciated.

— Reply to this email directly, view it on GitHub https://github.com/georust/gdal/issues/506#issuecomment-1868587093, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAOJNGI3NMVPLQRM6JURRTYLCEJFAVCNFSM6AAAAABBBXRIYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGU4DOMBZGM . You are receiving this because you authored the thread.Message ID: @.***>

lnicola commented 6 months ago

I don't know if we can get the driver metadata, so it might not be straightforward.

Atreyagaurav commented 6 months ago

I'll look into it. Worst case we can write a match statement (or hashmap) for most used extensions manually based on GIS softwares/gdal available drivers.

Atreyagaurav commented 6 months ago

So here is a crude implementation. It works.

https://github.com/Atreyagaurav/gdal/commit/22a1cf7a72c98dfc0179da354b1f42249b1ed095

We can probably put it in once_cell and save a HashMap if we are likely to call the function a lot. I think we probably won't call this function enough times to justify that, but we can do it.

I looked at the code you linked, those functions seem like they were manually checking for gpkg, and shp and not much. But gdal can handle so many extensions, so there must be something. So we can look at that if this implementation seems very crude.

If you think this is good enough. I can add documentations and other things and make a pull request.

lnicola commented 6 months ago

I think think we should keep the logic a little closer to the original. We should probably pass in the filename (because .shp.zip is annoying to handle and the caller might forget) and check the driver capabilities.

I don't think we should cache the result it since the drivers can be loaded and unloaded at runtime.

Atreyagaurav commented 6 months ago

Ok, this one is more or less rewriting of the function:

https://github.com/Atreyagaurav/gdal/commit/5f44a249ee91c6f57cc8a1c34ca313302161d530

The original one only checks for DMD_EXTENSIONS but this one checks for DMD_EXTENSION as well. Other than that it should be similar.

The tests all pass on my laptop, but I don't know if they'll pass on others (if drivers are missing or something).

Since we're checking for DCAP_CREATE, should we make a Database::create() function to parallel Database::open(), and call it from there?

lnicola commented 6 months ago

Not sure when I'll be able to look at it properly, but please file a PR so we can keep track of it.

Since we're checking for DCAP_CREATE, should we make a Database::create() function to parallel Database::open(), and call it from there?

But we already have Driver::create_xxx and.. Dataset::open, hmm dunno.

Atreyagaurav commented 6 months ago

Done. It seems to work for my use case. I don't know about other use-cases and edge cases.

Atreyagaurav commented 6 months ago

I've corrected the suggestions from clippy. Please refer to the comments in the pull request for any other details. I'll correct any errors that might come in the CI once it's been run again.

Atreyagaurav commented 6 months ago

Commenting here, as I don't think you'll get a notification if I comment on the pull request.

If you're the one approving the CI run, I've updated the code, can you approve the CI tests? It should pass this time. You can feel free to review on your free time, I just wanna make sure I can update it to pass the tests if it fails again. Sorry for the trouble.

lnicola commented 6 months ago

Sorry, I do get notifications, but didn't have yet a chance to take a proper look. And I'm a little confused about your GPKG issue, as far as I know, it should work fine on CI (I can see the failure in the Actions history, but have no explanation for it).

Anyway, I just triggered the CI in the PR.

Atreyagaurav commented 6 months ago

Yeah, I saw that issue, and saw some other tests using gpkg, so I don't understand it either. But the new test should account for that, if GPKG driver is available, but the test is failing somehow, it'll still fail, and I'll look into it. And if it doesn't fail this time, then maybe GPKG driver is not available while running that test.

EDIT: Thank you, I also just figured out how to run CI on my fork, so I can use that for trial and error if they fail again. tests passing locally and failing in CI has made it a bit hard to pin down the error.

EDIT2: the manually triggering the CI run on the fork didn't work, they fail on compiling gdal and other things.

lnicola commented 6 months ago

It should be available. I'd still like to figure this out because it might point to a deeper issue, not necessarily in your PR.

Atreyagaurav commented 6 months ago

It failed again, so the driver is available. could be that the metadata doesn't have .gpkg.zip. Is there a way to explore easily? maybe a docker image of the CI.. I looked at the metada from my python gdal library to test things.

lnicola commented 6 months ago

The CI runs on the ghcr.io/osgeo/gdal:ubuntu-full-X.Y.Z images, you should be able to run Python in those to check.

Atreyagaurav commented 6 months ago

Thanks, I'll check them out.

I'll let you know if I find something, or not find something.

Atreyagaurav commented 6 months ago

Found the problem, refer the comment on the pull request. On retrospect, I was thinking, if github didn't cancel other tests and ran all of them, we'd know if it was version problem coz I saw the old ones were the ones that failed. But I didn't say anything. And it turned out to be true lol.