What to do with border points stolen for interpolation in attributes endpoints

achaikou commented 8 months ago

We calculate attributes on trace subset which can be very close to the trace border points. As requested user values do not directly fall onto survey samples, we use interpolation algorithms to resample the data for users needs.

Near the end of the trace we are however lacking enough samples to perform good interpolation. Idea of implementing extrapolation on the borders was discarded as less precise solution which needs more efforts. Instead we went on with "stealing" couple of samples from survey for interpolation purposes. Assumption was that users are not that interested in the data near surface borders and will hardly ever notice that. This assumption might have been wrong.

Current number of stolen samples from each border is 2. This number was set by looking closer at different interpolation algorithms. However it was supposed to be internal value and subject to a change if we ever looked at interpolation logic again and with more knowledge.

Aka if survey has samples 100, 105, 110, 115, 120, 125, 130, then if user requests range [117, 127], we would throw because we don't have two samples to the other side from 127. Actually the same would happen for user input [117, 125] as 125 also doesn't have 2 samples on its right and we for simplicity ignore the fact that 125 is on the sample itself.

Problem here is that this information is hidden from the end-user. User is sure that by sending borders of [117, 127] their request should be handled, when it is not.

How can we deal with it?

If we update documentation with this information, we might lock user scripts to always rely on 2 samples, thus making internal information public and making it more difficult to change.
If we update error message with actual boundaries we face similar problem as users would still need to adjust to the error somehow, only without knowing how and why. Also at the time error message is thrown original context of user request is long stripped off, thus making it difficult to tell user what they did wrong.
We might want to revisit interpolation logic. Might it be that 1 sample on the side is good enough (aka distortion provided by having just 1 sample instead of 2 is not that big)? And if user range ends up with a sample, we shouldn't need any additional samples at all? In that case we might drop this problem for the user altogether. That however might require more deep understanding of interpolation than we currently have.
SC Day suggestion: use linear interpolation at the very end. Pros: it is likely to make more sense than whatever cubic interpolation comes up with in the end. Cons: we need to adjust to it at the stage of retrieving the samples, which might mean we need to pass down the flag that this one has less samples than expected.

achaikou commented 8 months ago

Current suggested solution is to document 2 samples-from-the-border requirement. It is unlikely that we will change it to 3 samples anyway. There is usually no good solution to this problem, so throwing near the border is good approach.

achaikou commented 8 months ago

Accepted approach: pad data with zeroes (or see what current algorithm does), do not throw. If user's surface goes near the border - it is user's responsibility to be aware that all data near the border is not to be trusted and can spoil the calculations.

equinor / vds-slice

What to do with border points stolen for interpolation in attributes endpoints #252