Add a reliable way to join telemetry data from different functions

SCasanova commented 1 year ago

Speaking to some users this seems to be a big deal and having an in-package function that properly joins data (telemetry with session laps for example) would be very useful.

pbulsink commented 1 year ago

Is this an after-CRAN update item, or would you prefer to include this when submitting?

SCasanova commented 1 year ago

I'm OK with doing this after CRAN

SCasanova commented 1 year ago

I used an interpolation solution in the past but it's not perfect in terms of accuracy. Furthermore it only works with one lap to one lap comparison as it relies on both datasets to have a common variable (in this case distance as the lap should be consitent).

This works comparably to official sources as I proved on my AWS mock app (I think you saw that right @pbulsink ?)

pbulsink commented 1 year ago

Yeah I saw it - cool project! I think too I have a better idea of what you're looking for.

FastF1 talks about some of the challenges of time and distance and interpolation in some articles, see: https://docs.fastf1.dev/time_explanation.html and https://docs.fastf1.dev/howto_accurate_calculations.html.

Of note is the caution against using interpolated data, since the data source frequency is low enough that there's potentially meaningful differences between the interpolated and actual value of something like speed in rapidly changing environments like when brake is applied at the end of a straight.

I think if you're doing visual work then overlaid plots show poor enough resolution that interpolation isn't required, but if you're doing modelling or something then only real data should be included. Can you provide an example of your use case where interpolation is beneficial?

SCasanova commented 1 year ago

Yeah, I agree with that. The current problem is that measurements aren't taken in the same or even regular intervals so there is no way to compare one to one without interpolating.

Aside from that, we can look to assign lap number based on distance (for every circuit) to be able to join telemetry to session lap data for example.

pbulsink commented 1 year ago

I think there's a way to better interpolate based on series prediction from n points on each side of the target - and doing a left-extrapolation from data after the time point in question, and right-extrapolation from data before that time point, and finding where those signals overlap to produce an interpolated value. It's not going to work for binary data (brake or DRS on/off) or binary-like data (e.g. 100 to 0 throttle) but might be ok for RPM, speed, distance (in a time as x axis domain) or time (in a distance as x axis domain).

Unfortunately, I'm sure that this becomes an active area for mathematics research and you'll not have high quality data without extensive computation and estimation (fitting each interpolated point with n input points and fitting n-1 order polynomials from left and right then finding where they cross).

For some analysis, an x = mean(x1,x2) and y = mean(y1,y2) type interpolation will work, but I agree - for more detailed things this won't. I think the frequency of the data stream is too low to be making accurate judgement calls on, for example, who was first on the throttle out of a certain corner on a given track. When AWS makes those graphics I'm sure they have higher frequency data to ensure more accurate images that what F1 leaks through their unofficial API.

pbulsink commented 1 year ago

To clarify my above comment, this could be the 'worst case scenario' of (for example) speed or RPM being interpolated between a sustained maximum and a sudden decrease (at brake point, or downshift).

SCasanova commented 1 year ago

Great illustration, thank you. Perhaps polynomial interpolation would be better in those cases.

I also did a very unexhaustive test of the cross extrapolation you proposed above and... it worked very well. I can share the file

SCasanova / f1dataR

Add a reliable way to join telemetry data from different functions #123