ESSS / pytest-regressions

Pytest plugin for regression testing: https://pytest-regressions.readthedocs.io
MIT License
185 stars 36 forks source link

`num_regression` cannot work with non squared numerical data #170

Open 12rambau opened 5 months ago

12rambau commented 5 months ago

I wanted to test the coordinates of a geometry generated from a polygon and make sure it's always the same. the output is a geojson dictionnary and here is a small example if you are not familiar with the format:

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "name": "Dinagat Islands"
  }
}

As the only thing I want to check is the geometry I will look into "coordinates". These coordinates are always set with many digits (8 in my case) which always create issue when dealing with the absolute compariasons of data_regression. So I turn myself to the num_regressionfixture that seems perfect for my use case. It was working fine until I tested more complicated geometries, specifically multipolygons. In this case the list of coordinates cannot be transformed into a nd.array as the different polygons forming the geometry are not of equal size. raising the following error:

ValueError: setting an array element with a sequence

Again to give an example my data can look like the following sequence:

numpy.array([[1, 2], [2, 3, 4]])         # wrong!

Question is why do you need to make transform the data into nd.array in the first place ? Could you instead support the comparison manually and thus support any sequence shape ?

nicoddemus commented 5 months ago

Hi @12rambau,

Question is why do you need to make transform the data into nd.array in the first place ?

That's how it is implemented today, we turn it into an nd.array because in the end we dump the arrays into a dataframe:

https://github.com/ESSS/pytest-regressions/blob/master/src/pytest_regressions/num_regression.py

Could you instead support the comparison manually and thus support any sequence shape ?

Not sure. If you want to dig into the code and see if there is a simple and backward compatible solution, we would love to review a PR in that direction. :+1:

12rambau commented 5 months ago

It will be necessary for my use case so i'll definitely look into it. I think the easiest way is to make num_regression independant from its dataframe counterpart ence recoding the check mechanism. It looks like a fun challenge. I cannot commit to a speady implementation but I'll try to work something out. For retro compatibility I guess the main challenge will be the the file format. I was thinking on relying on a yaml file (to honor nesting) but that won't work with the existing .csv format used in the current implementation.