DataverseLabs / pyinterpolate

Kriging | Poisson Kriging | Variogram Analysis
https://pyinterpolate.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
147 stars 26 forks source link

Adding support for passing Parquet and Feather files #392

Closed cheginit closed 1 year ago

cheginit commented 1 year ago

Is your feature request related to a problem? Please describe. Considering the popularity of parquet and feather files, it would be great to allow passing such files to Block.from_file.

Describe the solution you'd like It requires only adding a couple lines of code for detecting the file format and using the corresponding geopandas functions:

so this block of code for the Block.from_file function:

        if fpath.lower().endswith('.gpkg'):
            dataset = gpd.read_file(fpath, layer=layer_name)
        else:
            dataset = gpd.read_file(fpath)

becomes:

        if fpath.lower().endswith('.gpkg'):
            dataset = gpd.read_file(fpath, layer=layer_name)
        elif fpath.lower().endswith('.feather'):
            dataset = gpd.read_feather(fpath)
        elif fpath.lower().endswith('.parquet'):
            dataset = gpd.read_parquet(fpath)
        else:
            dataset = gpd.read_file(fpath)

We can, additionally, add a new kwargs input to the function and pass it on to geopandas so users can specify arguments like bbox and columns to geopandas function. The only additional change is adding **kwargs to Block.from_file and then pass it to geopandas, like so:

        if fpath.lower().endswith('.gpkg'):
            dataset = gpd.read_file(fpath, layer=layer_name, **kwargs)
        elif fpath.lower().endswith('.feather'):
            dataset = gpd.read_feather(fpath, **kwargs)
        elif fpath.lower().endswith('.parquet'):
            dataset = gpd.read_parquet(fpath, **kwargs)
        else:
            dataset = gpd.read_file(fpath)

For gpkg case, it has an additional benefit of allowing to pass pyogrio as engine, which significantly improves the reading speed.

Describe alternatives you've considered N/A

Additional context If interested, I can open a PR.

SimonMolinsky commented 1 year ago

Hi @cheginit !

You have a great idea, I like it! I didn't consider it myself (and by the way - I didn't use feather and parquet files yet; I should try them).

If you can open PR and include those code changes, that would be neat :) I will update the package when you send a PR here, but I need one more thing :) Do you have some sample feather or parquet files that can be used for testing? Or would you happen to know about open spatial data written in those formats?

Thanks a lot! 🙌

cheginit commented 1 year ago

Sure, I will open a PR.

Regarding examples, we can use the example file from geoparquet's official repo here. For testing purposes, we can save the same file as feather with geopandas and feed it back to the test function.

EDIT: Or even better, we can convert the gpkg file that you're using for tests to parquet and feather.

SimonMolinsky commented 1 year ago

@cheginit thanks for the PR. It is fantastic! You made me think about a few things related to the "backend" of the package, so thank you very much!

I've put a comment in the PR regarding your authorship, and I need your name and Github handle in the changed files. When you push it into PR, I will merge it and move the package into the build stage. Your input is substantial, so it should be released immediately.

cheginit commented 1 year ago

Sure, I am glad you find it helpful! I appreciate the credit. I am always happy to contribute to open-source projects.

SimonMolinsky commented 1 year ago

Sure, I am glad you find it helpful! I appreciate the credit. I am always happy to contribute to open-source projects.

Thank you very much! 😊 I've created tests, and a few problems have occurred, but it should be a matter of days, and I will polish it and push changes as the new release :)