data-engineering-collective / plateau

Flat files, flat land.
MIT License
23 stars 8 forks source link

Fix CI failures #145

Closed IzerOnadimQC closed 6 months ago

IzerOnadimQC commented 6 months ago

This PR should fix the current CI failures. There were several issues, listed below from most trivial to least, all but the first were related to tests involving nightly builds of pandas (caused by changes that will be introduced in pandas 3).

  1. reference-data was needed for pyarrow 15.0.2.
  2. Pandas 3 will remove the infer_datetime_format argument from to_datetime. According to the docs this has been deprecated since version 2 and passing it has no effect.
  3. The _is_view property will be removed from the DataFrame class. Obviously using a private property is always a bit dubious, but it was only used for testing. My no-less-dubious solution was to pull the same information out of another private property :)
  4. In two places in the tests, numpy.array_split is used to split a DataFrame. As far as I can tell this only works because array_split calls numpy.swapaxes here, which in turn uses calls swapaxes on the underlying object through a getattr call. This will no longer work once the (currently deprecated) swapaxes method is removed in Pandas 3. This is known to numpy maintainers (see this issue), and verdict appears to be that it was never intended usage and will not be supported going forward. Therefore, I added a workaround.