Closed theroggy closed 10 months ago
I wonder if we should instead raise an exception if read_geometry=False, columns=[], fid_as_index=False
? Because in that case, there is nothing meaningful to return. If fid_as_index=True
, then I suppose we should at least return an empty data frame with a non-empty index.
Alternatively, at the end of read_arrow
we could detect if the arrow table has no columns after attempting to read it, and then return a truly empty table:
if table.num_columns == 0:
# return empty pyarrow.Table if there were no columns read
return pyarrow.Table.from_pylist([])
We could probably even do that preemptively before even attempting to read the data source (which may be remote and slower), but raising an exception during parameter validation seems perhaps a bit better to me. Otherwise, if we just preemptively return an empty data frame / table, then it may out of sync with a read against the actual data source, which may fail for any number of reasons. I.e., it might be better that we don't succeed when reading no geometry / columns / FID when we'd fail otherwise if any of those are true / non-empty.
Thoughts?
Otherwise, if we just preemptively return an empty data frame / table, then it may out of sync with a read against the actual data source, which may fail for any number of reasons. I.e., it might be better that we don't succeed when reading no geometry / columns / FID when we'd fail otherwise if any of those are true / non-empty
IMO it would make sense to error instead of returning an empty table, as that's more likely to imply user error that columns
was empty
Alternatively, at the end of read_arrow we could detect if the arrow table has no columns after attempting to read it, and then return a truly empty table:
I personally wouldn't do this, as one could argue that the the current result with use_arrow=True is more correct than a "truly" empty table. Because you did read all rows, just no columns ..
But certainly fine with detecting this case and raising an error. I think it will typically indeed be a user error.
I also can't really think of use cases where reading no columns nor a geometry from a file would lead to something useful... so I agree it smells like a user error...
When fid_as_index=True
you could return some data even though no columns nor geometry is asked, but it still can't really think of a use for this either...
When reading no columns from a file, this results in an empty dataframe with no columns using read_dataframe, you get a different result depending on use_arrow:
Not sure what is the expected behaviour. For what it is worth, fiona seems to do the same as use_arrow=False.
Script to reproduce:
relevant output: