gnocchixyz / gnocchi

Timeseries database
Apache License 2.0
299 stars 85 forks source link

Add more fill variants #1310

Closed dbalagansky closed 9 months ago

dbalagansky commented 1 year ago

I'm not an expert in numpy, so this was borrowed from SO: https://stackoverflow.com/a/54508024.

I probably should rename this from "previous" to "similar" (or maybe some other better word?), as this thing covers the case when first point in resulting series could be NaN with backwards fill in addition to the standard forward fill.

Fixes #1266

tobias-urdin commented 1 year ago

@mergifyio rebase

mergify[bot] commented 1 year ago

rebase

❌ Unable to rebase: user tobias-urdin is unknown.

Please make sure `tobias-urdin` has logged in Mergify dashboard.
tobias-urdin commented 1 year ago

@Mergifyio rebase

mergify[bot] commented 1 year ago

rebase

✅ Branch has been successfully rebased

dbalagansky commented 11 months ago
  • would be good to have forward and backward as well. i suspect forward is the most common option.

I've been thinking about these for some time and the problem with plain backward and forward fill variants is that, given this series: [NaN, NaN, 1, 2, NaN, 4, 5, NaN, NaN]:

  • doing just forward fill would result in: [NaN, NaN, 1, 2, 2, 4, 5, 5, 5]
  • doing just backward fill would result in: [1, 1, 1, 2, 4, 4, 5, NaN, NaN]

So, as you can see, in some cases it wouldn't be enough to do forward or backward fill to produce fully populated series (without NaN's) and, maybe, it would be a good idea to:

So maybe implementing one of the above (or both?) would be a better option in the end?

chungg commented 10 months ago
* add something like `full_ffill` for doing `forward` than `backward` (for overwriting first two `NaN`'s in the example above after normal `forward` fill) and same thing with `full_bfill`: `backward` than `forward` fill. This way there would be no need for `similar` or `bidirectional` fill variants as the one that is implemented in this PR is just `full_ffill` one

* add `ffill` and `bfill`, which would leave `NaN`'s in the resulting series alone

yeah, this makes sense to me. i didn't even consider the differences in order of your full fill solution. i'm ok with the naming. don't have anything better to suggest

dbalagansky commented 10 months ago

This should be ready for review, hope I didn't miss any points, but feel free to point them out :)

One more thing that came to my mind during the final tests was that this probably should interact with resampling down (e.g., doing aggregates resample mean 5 on a metric with measurements every minute) by creating additional measurements and filling them, but that sound like a separate PR.