`FDataGrid.restrict` option `with_bounds`

ego-thales commented 1 year ago

Implemented proposed enhancement from #560.

A few notes to keep in mind when (or before) merging:

with_bounds defaults to False,
With high dim_domain, domain's boundary potentially contains a lot of points, which might end up pretty slow. That said, I guess it has to happen when manipulating a functional object on a huge domain size,
No 1D optimization was made. I don't know the general spirit of your implementation, but if you generally treat 1D differently with lighter/faster code, then this should also be done here.

Waiting for your feedback!

Cheers :) Élie

ego-thales commented 1 year ago

I now notice that this option potentially modifies the result of every FDataGrid.integrate call with a domain argument. This probably explains the failed tests.

One easy solution would be to set default value to False. The other option is to go through every failed test and check whether or not the difference can indeed be due to the more precise calculus.

In turn, it might be necessary to change some doc example, I'm not sure about it.

In any case, I will wait for your feedback and insights regarding this, bcause you have a much deeper view of the whole package. :)

vnmabus commented 1 year ago

The "proper" way to do it would be:

Decide if we want that the user can include the bounds as new points. As far as I see it, the main advantage is increased accuracy in the evaluation, while the main issue is that we add points that were not directly measured. Both behaviors may be surprising.
Note that there is also a third option (keep ALL points, just change the domain_range attribute) which could in principle be implemented. I think that currently the FDataGrid constructor disallows having grid points outside the domain range, but this invariant is never used AFAIK, so it could be removed. This approach keeps the original accuracy and does not add "synthetic" points, but it is probably wasteful, as most points are no longer needed.
Thus, we need to discuss which options could be selected by the user, and if a bool is enough for that choice or we need an enum or a Literal parameter.
We would probably want to define this method in a way that can be extended in the future to other FData subclasses such as FDataBasis and the future FDataIrregular class. For the former these options make no sense, but it make sense for the latter, so it would be better to take that into account from the beginning.
Once we have decided the possible choices, we have to pick a default one, looking at the pros and cons of each.
If the default option is different from the current one, we ideally should keep the old default with a deprecation warning for a while (requesting the user to explicitly choose an option to remove the warning). If we see that this method has no users outside the package, we can skip this. We could in principle do breaking changes as we are still in 0.x version, but it is not very polite.
We have to check the internal uses of this method to choose the more appropriate option for each.

eliegoudout commented 1 year ago

Hi, Just a quick answer from my personal phone regarding other possible ways to do this.

I will argue against keeping outside points for 2 reasons:

It will make interpolation vs. extrapolation blurry. For example, how to you extrapolate at x = -.09, with domain [0, 1] and grid points -.1, .1, .3? If you use -.1 value, then you're much closer to interpolation than extrapolation, and if you don't, you lose precision.
The general idea of "restricting" IS, imo, forgetting about outside information. Imagine wanting to test out an extrapolation method, then it is important that restriction doesn't know about outside values.

I agree that using Tru as default value might not be polite. I thus think that if we end up deciding that it would be the best option, then the way to go surely is warning users during the next release (or more), before applying the change.

Is there a way to know more about FDataIrregular?

I think a good way to choose the defaul value will be to test the accuracy gains on plausible cases, as well as execution time lost.

vnmabus commented 1 year ago

* It will make interpolation vs. extrapolation blurry. For example, how to you extrapolate at `x = -.09`, with domain `[0, 1]` and grid points `-.1, .1, .3`? If you use `-.1` value, then you're much closer to interpolation than extrapolation, and if you don't, you lose precision.

That is true, although it is more a semantic issue. By default, the extrapolation strategy of FDataGrid delegates to interpolation. If you change it to something else, it would be applied to the points outside the domain, as usual.

* The general idea of "restricting" **IS**, imo, forgetting about outside information. Imagine wanting to test out an extrapolation method, then it is important that restriction doesn't know about outside values.

Extrapolation would be applied for the points outside the domain, in spite of having values measured for them. The default extrapolation, of course, uses these values.

Is there a way to know more about FDataIrregular?

Yes, it is a class to represent functions measured at different points (not a fixed grid). It was first proposed in #498 and a tentative implementation is in #536.

I think a good way to choose the defaul value will be to test the accuracy gains on plausible cases, as well as execution time lost.

I want to discuss also the issue with the other people involved in the project, as they may provide more insights. They are currently on vacation but we will meet again in September.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.02% :tada:

Comparison is base (6a066f7) 86.02% compared to head (df19f39) 86.05%.

:exclamation: Current head df19f39 differs from pull request most recent head e1dfe7d. Consider uploading reports for the commit e1dfe7d to get more accurate results

Additional details and impacted files

```diff @@ Coverage Diff @@ ## develop #561 +/- ## =========================================== + Coverage 86.02% 86.05% +0.02% =========================================== Files 147 147 Lines 11660 11678 +18 =========================================== + Hits 10031 10049 +18 Misses 1629 1629 ``` | [Files Changed](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/561?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM) | Coverage Δ | | |---|---|---| | [skfda/representation/grid.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/561?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvcmVwcmVzZW50YXRpb24vZ3JpZC5weQ==) | `86.50% <100.00%> (+0.25%)` | :arrow_up: | | [skfda/tests/test\_grid.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/561?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvdGVzdHMvdGVzdF9ncmlkLnB5) | `99.36% <100.00%> (+0.04%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

vnmabus commented 1 year ago

@all-contributors please add @ego-thales for his code contributions and ideas.

allcontributors[bot] commented 1 year ago

@vnmabus

I've put up a pull request to add @ego-thales! :tada:

GAA-UAM / scikit-fda

`FDataGrid.restrict` option `with_bounds` #561

Codecov Report