MrPowers / dask-interop

Integration tests to demonstrate Dask's interoperability with other systems
3 stars 2 forks source link

Show that fastparquet is not to blame (dask is) #5

Closed martindurant closed 3 years ago

martindurant commented 3 years ago

In the absence of a _metadata file, dask presumes all files in the directory are data files ( https://github.com/dask/dask/blob/main/dask/dataframe/io/parquet/fastparquet.py#L169 ). It should do the same as fastparquet, and filter for .parq, .parquet and _metadata*.

MrPowers commented 3 years ago

@martindurant - thanks for pointing me in the right direction. I was trying to figure this out in the code and didn't come across the line you pointed out. I will try again!