floodlight-sports / floodlight

Python package for streamlined analysis of sports data.
https://floodlight.readthedocs.io/en/latest/index.html
MIT License
65 stars 14 forks source link

fix: nanargmin with all nan slice #134

Open manuba95 opened 1 year ago

manuba95 commented 1 year ago

This fixes an issue in the DiscreteVoronoiModel. The model crashes when frame are all NaN because of the np.nanargmin() function. This is because of type stability of the return of that function as its supposed to return integers. See https://github.com/pydata/xarray/issues/4481 and https://github.com/pydata/xarray/issues/3884. One easy fix for this would be to apply the np.nanargmin() only to slices that are not all NaN using np.where(). However, masking the hole array before the loop and catching the all NaN slices early might be a more elegant and better performing option.

codecov[bot] commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.58%. Comparing base (e1de087) to head (5d10c39). Report is 1 commits behind head on develop.

:exclamation: Current head 5d10c39 differs from pull request most recent head 277fa24

Please upload reports for the commit 277fa24 to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #134 +/- ## ======================================== Coverage 95.58% 95.58% ======================================== Files 47 47 Lines 3194 3194 ======================================== Hits 3053 3053 Misses 141 141 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

manuba95 commented 5 months ago

This commit fixes an issue with the previous commit. Although the np.where() is supposed to handle the all-nan slices, the np.nanargmin() is still called upon the entire array, resulting in the same error. This is now fixed by masking the array. Although it is possible to catch this edge case earlier (e.g. by masking the array before the initial for loop) I think this is currently the approriate solution as other unexpected user choices may result in all-nan slices in the closest_player_index (e.g. manipulation of the mesh grid). The tests all run properly but I think they should be extended to at least the all-nan slice edge case. Also, there is currently no test for the plotting, which needed a fix as well. Also, I think it can be up to debate, weather inactive players (or teams) should have a space controll value of 0 (as currently) or if it should be np.NaN. Here, I would appreciate some feedback before continuing finalizing this PR.