Open bilelomrani1 opened 4 years ago
Hi, I was trying to reproduce the issue with some generated data but i could not. Maybe am I missing something?
# make some data
import os
import pandas as pd
import numpy as np
make_data=True
lat_lon_size=100000
csvs_num=10
if make_data:
#first make some data
np.random.seed(1234)
#
lat=np.random.random(lat_lon_size)*(10) - 37
lon=np.random.random(lat_lon_size)*(10) - 64
#
df_=pd.DataFrame({'id':1,'Latitude':lat,'Longitude':lon})
# dump dir
os.makedirs('./csv',exist_ok=True)
for i,f in enumerate(np.random.random(csvs_num)):
df_.sample(frac=f).to_csv('./csv/test_{}.csv'.format(str(i)),index=False)
### Issue code
import dask.dataframe as dd
import dask_geopandas
# from issue just replace *_timeseries.csv' by test_*.csv
df = dd.read_csv('csv/test_*.csv')
gdf = dask_geopandas.from_dask_dataframe(df)
gdf = gdf.set_geometry(
dask_geopandas.points_from_xy(gdf, x='Longitude', y='Latitude')
).set_crs('epsg:4326').to_crs('epsg:3395')
#
gdf.compute()
which ends up with no errors.
Sorry for the delay. The snippet you provided indeed works on my machine. Now my code works fine on the latest version of dask-geopandas
. Maybe the mistake was on my side in the first place or the latest version fixed the issue. Anyway thank you very much!
I have multiple CSV files opened with
dask
as is:When invoking
gdf.compute()
, the following exception is raised:The exception disapears when
df
if a single csv filedf = dd.read_csv('csv/2019_timeseries.csv')