`pyarrow` dependency causing problems

USEPA / standardizedinventories

Standardized Release and Waste Inventories

MIT License

25 stars 16 forks source link

`pyarrow` dependency causing problems #88

Closed matthewlchambers closed 2 years ago

matthewlchambers commented 2 years ago

Line 67 in NEI.py specifies the pyarrow engine for pd.read_parquet, but the current Windows Anaconda distribution of pyarrow does not include support for the snappy codec, so I have to use the fastparquet engine instead. On the other hand, I see that @bl-young had some issues with fastparquet here. Could we leave the engine unspecified in the call to pd.read_parquet, so the user can use whichever engine works better for them?

bl-young commented 2 years ago

I think it should be fine to remove the engine from the call. Do you want to do that and submit a pull request so we can give you credit?

matthewlchambers commented 2 years ago

Pull request submitted, though I apparently didn't manage to link it to this issue properly.

bl-young commented 2 years ago

did removing the engine fix the issue on your end? because pyarrow is a requirement of the package, I'm wondering if pandas will still default to that instead of fastparquet (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html)

matthewlchambers commented 2 years ago

It did fix the issue on my end.