bairhys / prometheus-cake-autorate-exporter

A Prometheus exporter for CAKE Autorate
GNU General Public License v3.0
3 stars 1 forks source link

BUG: Starlink satellite compensation crashes the exporter with: `ValueError: 'dl_idle_sss' is not in list` #4

Closed Wabuo closed 7 months ago

Wabuo commented 7 months ago

I get the following error:

python prometheus_cake_autorate_exporter.py
2024-01-12 19:56:23,584 - Log file /var/log/cake-autorate.primary.log, metrics at: http://localhost:9101/metrics
Traceback (most recent call last):
  File "/opt/prometheus-cake-autorate-exporter/prometheus_cake_autorate_exporter.py", line 155, in <module>
    readLineData(data)
  File "/opt/prometheus-cake-autorate-exporter/prometheus_cake_autorate_exporter.py", line 88, in readLineData
    DL_LOAD_CONDITION.state(data[27].replace(' ', ''))
  File "/usr/lib/python3.11/site-packages/prometheus_client/metrics.py", line 745, in state
    self._value = self._states.index(state)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 'dl_idle_sss' is not in list

I'm pretty sure it's part of the Starlink satellite code as in if I turn the Starlink satellite compensation code of it's not crashing and the name fits to.

bairhys commented 7 months ago

Hey thanks for letting me know.

Could you post some of your cake-autorate logs from the /var/log/cake-autorate.primary.log file? A couple of each DATA, LOAD, SHAPER and SUMMARY lines would be helpful. Then I can get the order right

Wabuo commented 7 months ago

Sure thing, here you go, and cheers for caring.

Note the dl_idle_sss; ul_idle_sss outputs. (I turned more Starlink stuff on but didn't see more things pop up, but will obviously test the fix as soon as you throw it at me)

SUMMARY; 2024-01-13-16:44:11; 1705117451.616657; 23; 21; 0; 0; 2772; 2772; dl_idle; ul_idle; 100000; 2293
DATA; 2024-01-13-16:44:11; 1705117451.662338; 1705117451.659854; 23; 21; 0; 0; 1705117451.638010; 156.154.70.3; 36; 21416; 23950; 3742; 2534; 40120; 21416; 23950; 3742; 2534; 45233; 0; 2672; 80120; 0; 2672; 85233; dl_idle; ul_idle; 100000; 2293
SUMMARY; 2024-01-13-16:44:11; 1705117451.664895; 23; 21; 0; 0; 2672; 2672; dl_idle; ul_idle; 100000; 2293
SHAPER; 2024-01-13-16:44:11; 1705117451.707474; tc qdisc change root dev eth0 cake bandwidth 2000Kbit
DEBUG; 2024-01-13-16:44:11; 1705117451.709512; adjust_ul_shaper_rate set to 0 in config, so skipping the corresponding tc qdisc change call.
DATA; 2024-01-13-16:44:11; 1705117451.733805; 1705117451.713504; 23; 21; 0; 0; 1705117451.685860; 8.8.4.4; 37; 22868; 22850; 1635; -18; 40120; 22868; 22850; 1635; -18; 46000; 0; 2617; 80120; 0; 2617; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:11; 1705117451.737762; 23; 21; 0; 0; 2617; 2617; dl_idle_sss; ul_idle_sss; 100000; 2000
DATA; 2024-01-13-16:44:11; 1705117451.760560; 1705117451.758031; 23; 21; 0; 1; 1705117451.737790; 94.140.15.16; 37; 21439; 23400; 3145; 1961; 40120; 21439; 23400; 3145; 1961; 46000; 0; 2458; 80120; 0; 2458; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
LOAD; 2024-01-13-16:44:11; 1705117451.760567; 1705117451.759295; 24; 22; 100000; 2000
SUMMARY; 2024-01-13-16:44:11; 1705117451.763395; 23; 21; 0; 0; 2458; 2458; dl_idle_sss; ul_idle_sss; 100000; 2000
DATA; 2024-01-13-16:44:11; 1705117451.788502; 1705117451.786108; 24; 22; 0; 1; 1705117451.769750; 9.9.9.9; 37; 9447; 14250; 3740; 4803; 40120; 9447; 14250; 3740; 4803; 46000; 0; 2624; 80120; 0; 2624; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:11; 1705117451.790722; 24; 22; 0; 0; 2624; 2624; dl_idle_sss; ul_idle_sss; 100000; 2000
DATA; 2024-01-13-16:44:11; 1705117451.861287; 1705117451.858799; 24; 22; 0; 1; 1705117451.837900; 8.8.8.8; 37; 21543; 23200; 3057; 1657; 40120; 21543; 23200; 3057; 1657; 46000; 0; 2457; 80120; 0; 2457; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:11; 1705117451.863845; 24; 22; 0; 0; 2457; 2457; dl_idle_sss; ul_idle_sss; 100000; 2000
DATA; 2024-01-13-16:44:11; 1705117451.913313; 1705117451.910912; 24; 22; 0; 1; 1705117451.889880; 185.228.168.9; 37; 21343; 24100; 3557; 2757; 40120; 21343; 24100; 3557; 2757; 46000; 0; 2282; 80120; 0; 2282; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:11; 1705117451.915872; 24; 22; 0; 0; 2282; 2282; dl_idle_sss; ul_idle_sss; 100000; 2000
LOAD; 2024-01-13-16:44:11; 1705117451.960912; 1705117451.959946; 18; 22; 100000; 2000
DATA; 2024-01-13-16:44:11; 1705117451.961523; 1705117451.958938; 24; 22; 0; 1; 1705117451.937900; 156.154.70.3; 37; 21417; 23000; 3536; 1583; 40120; 21417; 23000; 3536; 1583; 46000; 0; 2123; 80120; 0; 2123; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:11; 1705117451.964003; 24; 22; 0; 0; 2123; 2123; dl_idle_sss; ul_idle_sss; 100000; 2000
DATA; 2024-01-13-16:44:12; 1705117452.005534; 1705117452.004511; 18; 22; 0; 1; 1705117451.989820; 8.8.4.4; 38; 22868; 23850; 1572; 982; 40120; 22868; 23850; 1572; 982; 46000; 0; 2290; 80120; 0; 2290; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:12; 1705117452.006681; 18; 22; 0; 0; 2290; 2290; dl_idle_sss; ul_idle_sss; 100000; 2000
DATA; 2024-01-13-16:44:12; 1705117452.095849; 1705117452.093410; 18; 22; 0; 1; 1705117452.072290; 94.140.15.16; 38; 21457; 40000; 4607; 18543; 40120; 21457; 40000; 4607; 18543; 46000; 0; 5054; 80120; 0; 5054; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:12; 1705117452.098423; 18; 22; 0; 0; 5054; 5054; dl_idle_sss; ul_idle_sss; 100000; 2000
DATA; 2024-01-13-16:44:12; 1705117452.114436; 1705117452.112033; 18; 22; 0; 1; 1705117452.082840; 9.9.9.9; 38; 9457; 20000; 4386; 10543; 40120; 9457; 20000; 4386; 10543; 46000; 0; 6010; 80120; 0; 6010; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:12; 1705117452.116964; 18; 22; 0; 0; 6010; 6010; dl_idle_sss; ul_idle_sss; 100000; 2000
LOAD; 2024-01-13-16:44:12; 1705117452.163022; 1705117452.162213; 20; 19; 100000; 2000
DATA; 2024-01-13-16:44:12; 1705117452.176078; 1705117452.174405; 20; 19; 0; 0; 1705117452.157700; 8.8.8.8; 38; 21553; 32300; 3787; 10747; 40120; 21553; 32300; 3787; 10747; 46000; 0; 7525; 80120; 0; 7525; 86000; dl_idle_sss; ul_idle_sss; 100000; 2000
SUMMARY; 2024-01-13-16:44:12; 1705117452.178259; 20; 19; 0; 0; 7525; 7525; dl_idle_sss; ul_idle_sss; 100000; 2000
SHAPER; 2024-01-13-16:44:12; 1705117452.213582; tc qdisc change root dev eth0 cake bandwidth 2020Kbit
DEBUG; 2024-01-13-16:44:12; 1705117452.215528; adjust_ul_shaper_rate set to 0 in config, so skipping the corresponding tc qdisc change call.
DATA; 2024-01-13-16:44:12; 1705117452.221515; 1705117452.218941; 20; 19; 0; 0; 1705117452.189680; 185.228.168.9; 38; 21344; 23000; 3376; 1656; 40120; 21344; 23000; 3376; 1656; 45940; 0; 7342; 80120; 0; 7342; 85940; dl_idle; ul_idle; 100000; 2020
SUMMARY; 2024-01-13-16:44:12; 1705117452.224207; 20; 19; 0; 0; 7342; 7342; dl_idle; ul_idle; 100000; 2020
DATA; 2024-01-13-16:44:12; 1705117452.262596; 1705117452.260935; 20; 19; 0; 0; 1705117452.242960; 156.154.70.3; 38; 21420; 24500; 3492; 3080; 40120; 21420; 24500; 3492; 3080; 45940; 0; 7591; 80120; 0; 7591; 85940; dl_idle; ul_idle; 100000; 2020
SUMMARY; 2024-01-13-16:44:12; 1705117452.264378; 20; 19; 0; 0; 7591; 7591; dl_idle; ul_idle; 100000; 2020
DATA; 2024-01-13-16:44:12; 1705117452.332350; 1705117452.329895; 20; 19; 0; 0; 1705117452.306950; 8.8.4.4; 39; 22876; 31450; 2237; 8574; 40120; 22876; 31450; 2237; 8574; 45940; 0; 8857; 80120; 0; 8857; 85940; dl_idle; ul_idle; 100000; 2020
SUMMARY; 2024-01-13-16:44:12; 1705117452.334923; 20; 19; 0; 0; 8857; 8857; dl_idle; ul_idle; 100000; 2020
LOAD; 2024-01-13-16:44:12; 1705117452.365904; 1705117452.364950; 33; 76; 100000; 2020
Wabuo commented 7 months ago

@lynxthecat are there any more flags that could pop up because of Starlink stuff that just haven't shown themselves yet?

lynxthecat commented 7 months ago

I think the only affected output is the load_condition. Here is the classification routine:

https://github.com/lynxthecat/cake-autorate/blob/afd87907aea5592f58d78c571631a710b6e2c417/cake-autorate.sh#L534

So possible values follow the format:

[d,u]l_[high,low,idle][_bb][_sss]

bairhys commented 7 months ago

Thanks @lynxthecat, that made the change easy!

@Wabuo I added the 3 starlink _sss states to both dl and ul, hopefully this fixes the issue for you

lynxthecat commented 7 months ago

@Wabuo I would very much value if you could post graphics from your Starlink setup using this exporter and we can help optimise cake-autorate for you. I am keen to ensure cake-autorate is working well for Starlink users. You can post on the cake-autorate OpenWrt thread here:

https://forum.openwrt.org/t/cake-w-adaptive-bandwidth/135379.

@bairhys are you still using cake-autorate? I'm still using it 24/7 - it works well for my 4G connection.

bairhys commented 7 months ago

@lynxthecat still using it all the time. Haven't touched it or anything, really happy with performance that I don't think about trying different things.

I used to use Gargoyle for years as it has a similar feature to cake-autorate just in the download direction and I was satisfied at the time. I tried Gargoyle first time in a while on my RT3200 few months ago since they had a new release and while it has a nice out of the box experience compared to OpenWRT, (it is easier to setup various features - I had troubles setting up wireguard server on openwrt), the dealbreaker was that it uses a different QoS instead of cake, therefore cannot use cake-autorate, so came back to OpenWRT. I remember reading that a Gargoyle dev said it would be quite a process to use cake on Gargoyle.

lynxthecat commented 7 months ago

OK cool - great to hear! And I'm curious: are you still using 'tsping' rather than 'fping'? I myself use 'fping', but that is in part owing to some particular oddity with my connection. 'tsping' is in principal superior given true OWDs, but performance with 'fping' is still surprisingly good.

bairhys commented 7 months ago

@lynxthecat Yeah still use your compiled version of tsping, no issues for me. I havent used fping since I started using cake-autorate.

lynxthecat commented 7 months ago

Good to know. One interesting consideration with 'tsping' was dealing with timestamp rollovers. It's dealt with in the cake-autorate code here:

https://github.com/lynxthecat/cake-autorate/blob/afd87907aea5592f58d78c571631a710b6e2c417/cake-autorate.sh#L690

I think I got it right, but please let me know if you ever see any oddities that you think could be attributable to this nuance.

bairhys commented 7 months ago

Ok I just updated to 3.2.0-PRERELEASE

Wabuo commented 7 months ago

@Wabuo I would very much value if you could post graphics from your Starlink setup using this exporter and we can help optimise cake-autorate for you. I am keen to ensure cake-autorate is working well for Starlink users. You can post on the cake-autorate OpenWrt thread here:

https://forum.openwrt.org/t/cake-w-adaptive-bandwidth/135379.

@bairhys are you still using cake-autorate? I'm still using it 24/7 - it works well for my 4G connection.

It's gonna happen, but I thought I'll wait till there is a fix for this issue because I can't turn on the satellite optimization without losing the plotting through Grafana.

But if you say it doesn't matter for fine-tuning, then I'll write up a forum post right now.

PS: a little funn-fact Starlink blocks icmp timestamp messages, therefore tsping sadly doesn't work. (google Starlink tsping, they said they want to fix that in the "near" future, but that statement is over 2 years old by now ... might tickle the support again and see ...)

Wabuo commented 7 months ago

Oh just seen there's a fix ... of testing I go ...

Thanks mate!

Wabuo commented 7 months ago

Initial test:

Everything seems to work fine! Awesome work!

bairhys commented 7 months ago

@Wabuo That's great! Easy fix

Wabuo commented 7 months ago

Again cheers! Would be cool to have Grafana panel for it to see the satellite gaps aka the quick interruption when the connection jumps from one satellite to another, but that's on the nice to have aka when ur bored list.