bairhys / prometheus-cake-autorate-exporter

A Prometheus exporter for CAKE Autorate
GNU General Public License v3.0
3 stars 1 forks source link

Excellent #2

Closed lynxthecat closed 7 months ago

lynxthecat commented 1 year ago

@bairhys simply excellent work in putting this together! Once I have had a chance to try it out myself I intend to link your code in and incorporate instructions for use in the cake-autorate documentation.

Out of curiosity, have you experimented much in terms of comparing 'fping' with 'tsping' on your connection and which gives better and more reliable results?

Is it easy for you to put together the graphic summary you put in your README here? If so, I'd be very curious to see one snapshot with 'fping' and another with 'tsping' (using dl_delay_thr_ms=30 and ul_delay_thr_ms=30 for 'fping', but probably dl_delay_thr_ms=10 and ul_delay_thr_ms=30 for 'tsping').

bairhys commented 1 year ago

Hey @lynxthecat thanks for your feedback, you made it easy to get the data from the logs! Pretty easy after that to get it into Prometheus database. I'm keen to hear your thoughts once you get it up and running.

I haven't felt the need to experiment since cake-autorate&tsping are working really nice and I feel that I am getting the max from my connection with respect to ping & bandwidth. But I will do the your suggested test and get back to you with a screenshot. Maybe I'll run a https://www.waveform.com/tools/bufferbloat test or a better speed test for both, just to be comparing similar loads.

You can definitely see that tsping is measuring different OWD in UL and DL directions. Interesting to note is that the ul owd is more noisy than the dl owd. I guess noise differences could be due to the LTE modulation and asymmetrical bandwidth each direction. Also some owd are negative for some reason which I haven't investigated yet. Below graph over 24h.

image

lynxthecat commented 1 year ago

Yes two speed tests using speedtest.net for both 'tsping' and 'fping' would be great. With 'tsping' I likewise see much more activity in respect of the upload OWDs than in respect of the download OWDs, and in fact even see a large spike in upload OWD on download saturation that is not necessarily accompanied by a spike in download OWD, which concerns me somewhat because whereas that would still correspond with a spike in RTT (download OWD plus upload OWD), since cake-autorate only looks at the download OWD for controlling the download CAKE rate, this could mean that RTT spikes associated with an increase in upload OWD but not download OWD end up getting tolerated, which does not seem desirable. And for this reason I am beginning to think that for my connection 'fping' might be a better choice. But this could reflect an irregularity with my connection and not yours. Hence I am curious to see your data for both 'tsping' and 'fping' with a couple of speedtest runs on both.

bairhys commented 1 year ago

Hi Lynx, I finally had a chance to do this test. Here are the results

tsping 30 30

test 1 01

test 2 02

tsping 10 30

test 3 03

test 4 04

fping 30 30

test 5 05

test 6 06

test 7 07

tsping 30 30 again

test 8 08

test 9 09

test 10 10

and grafana screenshot with all the tests

annotated

lynxthecat commented 1 year ago

The results look fantastic. Actually tsping seems to look OK on your connection.

Which out tsping and fping do you prefer and with which delays? Since tsping offers OWDs it is technically superior. But I'm not entirely sure yet about which is better in practice. Whereas fping has been thoroughly tested, tsping hasn't been tested so much yet.

Where tsping might really shine in a way not captured by your tests is during mixed download and upload.

Negative baselines for tsping is OK. It just means there's a clock difference between your machine and remote. What matters is the delta OWD delays between the raw OWD values and baselines since those are used to base rate decisions from.

One thing. The delta ewmas should not change on load. They are intended to capture an ewma of the delta in in the unloaded state. And this information is used to rotate out poorer performing reflectors to help converge on an optimal reflector set.

Are you using a commit prior to:

https://github.com/lynxthecat/cake-autorate/commit/8d1cde034b25800ad6f4a787f1226a903f174ca5

If so that explains it because the loads were not properly getting through to the process that alters the delta ewmas.

bairhys commented 1 year ago

tsping 30 30 works great, I haven't felt the need to change it at all. It is a nice compromise between speeds and ping. tsping 10 30 is a bit low throughput on dl. I haven't considered going back to fping.

Ah that's why that measurement is negative.

Yeah I was using a cake autorate from 2 weeks ago from memory, just after the random crashes were fixed. Haven't had a crash since. Great job! Just updated now to latest commit

lynxthecat commented 1 year ago

Hey @bairhys please can you merge your improve-controller branch into main since I pulled these in on cake-autorate?

I myself have switched to tsping now and am also finding the performance excellent. Thanks again for heavily testing tsping and the new improve-controller commits.

bairhys commented 1 year ago

@lynxthecat Done. Have you had a chance to test this exporter with grafana?

cake-autorate is working great, cannot use the internet without it!

lynxthecat commented 1 year ago

Excellent. And not yet - so far I have focused on tweaking the fundamentals in cake-autorate. I think things are looking pretty good at the moment though.