Closed sf-dcp closed 5 months ago
What test data used in test_to_parquet
fn looks like:
In .csv
format only:
In .csv
, .shp
, .gdb
, zipped .csv
/.shp
/.gdb
formats:
A couple small notes, but other than that this looks great
@fvankrieken, I added 3 commits based on your recs
This is a continuation of #658 PR converting raw data to parquet. Related to #631 issue.
It's probably easiest to review the PR commit by commit. A big chunk of changed files here is revised test data used in tests.
Major changes:
1) Originally, the
to_parquet
function was responsible for reading a local dataset into a pandas or geopandas dataframe (pandas df if data isn't geospatial) and then output to a parquet/geoparquet file.2) Implement a zipped input data format in
to_parquet
in 2 steps: unzip a file & use existing code to process an unzipped file. I added tests for zipped csv, zipped shapefile, and zipped geodatabase.3) Implement a csv format that has longitude and latitude columns instead of one geometry column. Test is also present.
4) Name geometry column as
geom
in output parquet file.5) Due to added data formats, expand test code, generating fake data. Note, test code is becoming messy, and we will refactor it in a separate PR as it will touch tests outside of
to_parquet
function.Side note
to_parquet
fn converts geospatial input data into ageoparquet
data format: no need to explicitly specify a file extension to be.geoparquet
. This is becausegeopandas
df automatically becomesgeoparquet
.TODO
for next PR:config
object as a function input instead of the entire thing.