DoESLiverpool / somebody-should

A place to document practices on the wiki and collect issues/suggestions/to-do items for the physical space at DoES Liverpool
31 stars 11 forks source link

Broadband doesn't seem to be working as well as it was #1664

Open ajlennon opened 2 years ago

ajlennon commented 2 years ago

I've had on and off problems with the broadband for the past few weeks and the BeeCam project is no longer working very well as the A/V keeps buffering. I replaced the live feeds with a test stream to YouTube and I still see the problem.

I am trying to dig out some packet loss, latency and jitter measurement tools and my initial non-scientific results are that we seem to be getting a fair bit of bursty packet loss.

IANANA (I am not a network administrator) so can anybody advise? @johnmckerrell ?

image

ajlennon commented 2 years ago

OK I have cut the Bee stuff completely out of the loop and am running this FFMPEG test to YouTube on my laptop. This works for a bit then data is no longer received at the YouTube end although it's still sending from my laptop. Pretty sure this all used to work fine

ffmpeg -f lavfi -i testsrc=size=1280x720:rate=25 -f lavfi -i sine=f=440:b=4 -shortest -codec:v libx264 -b:v 2048k -c:a aac -b:a 128k -f flv rtmp://a.rtmp.youtube.com/live2/$STREAM_KEY
MatthewCroughan commented 2 years ago

It is my understanding that Baltic Broadband have switched over to radio/wireless infrastructure fully. I think that we are no longer receiving a gigabit fiber link to the point of presence downstairs and should expect a higher contention ratio with other customers, as well as more packet loss when their radio equipment gets jostled in storms.

Is it true that everything is radio based now? @johnmckerrell

MatthewCroughan commented 2 years ago

From https://github.com/DoESLiverpool/somebody-should/issues/1653:

I believe they’ve made an improvement if someone in the space could try it. From the sounds of things we are no longer on a hard fibre but are served by “radio” so that might be part of the reason we’re seeing a change. I’ll ask more about that when I confirm whether we’re seeing an improvement or not.

ajlennon commented 2 years ago

Yes that's my understanding too. That doesn't help me address the issues I am seeing though.

MatthewCroughan commented 2 years ago

@ajlennon You and I could both foot the bill to get gigabit fiber back in, I'm willing to go half on that. The price of adding gigabit would be £100/month extra on top of what we have, so both of us could pay £50/month towards it.

DoES currently pays £150/month for the current plan, so in total the contract would be £250/month from DoES' perspective.

ajlennon commented 2 years ago

What is DoES paying for?

johnmckerrell commented 2 years ago

We have the £150 Aurora package on this page: https://www.balticbroadband.com/broadband-wireless/

johnmckerrell commented 2 years ago

I can get in touch with them again to report continuing problems but having as much information as possible would be useful. What timespan is covered by the first image on this issue?

ajlennon commented 2 years ago

Thanks John - I'll try to do some more testing tomorrow and be a bit more specific?

amcewen commented 2 years ago

We've had no Internet since 9:46 this morning. It seems to just be us (Aeternum are fine, and also on Baltic).

Baltic noticed before we told them, and phoned up to check. They didn't seem to be able to see their router, but I've confirmed with them that it's on and has some flashing lights, so seems like it's up at this end. They're looking into it further and going to give us a call back when they've got more info.

MatthewCroughan commented 2 years ago

I noticed @amcewen's response 10 minutes ago, and have been periodically pinging my box. As of just now, the network is accessible, then went offline again. So it seems as if the network connectivity is intermittent, rather than completely offline.

amcewen commented 2 years ago

Baltic have been out and replaced their router, and once we'd restarted the Ethernet switch at the edge of our network things seem back up and running okay.

Not quite sure why a dumb Ethernet switch should need a reboot to fix things, but that's what did the trick.

ajlennon commented 2 years ago

How strange !

MatthewCroughan commented 2 years ago

@amcewen Dumb ethernet switches still store state like the ARP cache, who knows whether the switch disabled some ports due to spanning tree? Or learned the wrong info when MAC learning during the switchover. State causes all sorts of problems. My PFSense router also required a reboot in the last downtime when Baltic had a routing error, for who knows what reason.

For reasons like this, I'm going to use https://scion-architecture.net/ at a datacenter in Wales where I'm going to start hosting boxes, I believe it will get around these sorts of networking problems. Would be good to try and get a SCION node installed at Edge5 and maybe the POP below DoES, then we could also see sub-millisecond latency. Worth looking into if you dig networking.

ajlennon commented 2 years ago

Haven't had a chance to do more testing at DoES I'm afraid @johnmckerrell but I will when I'm in next.

For a comparison I ran the same thing here (home, Virgin Media via WiFI) and nothing seems to get dropped

@see https://packetlosstest.com/

image

ajlennon commented 2 years ago

I've been footling around. I found this thing Prometheus that monitors all sorts of things, and there's a node_exporter application that generates a load of metrics. So I have that running on my Pi in the BeeHive now and I am graphing up the metrics with Grafana. It's all quite nice actually.

So will leave this running and see if anything leaps out

image

https://grafana.dynamicdevices.co.uk:3000/d/rYdddlPWk/node-exporter-full?orgId=1&refresh=1m

ajlennon commented 2 years ago

So I just did a test now 28/02/22 13:38 and it is looking good... Need to work out how it changes over time

image

cameronswift commented 2 years ago

I've been footling around. I found this thing Prometheus that monitors all sorts of things, and there's a node_exporter application that generates a load of metrics. So I have that running on my Pi in the BeeHive now and I am graphing up the metrics with Grafana. It's all quite nice actually.

So will leave this running and see if anything leaps out

image

https://grafana.dynamicdevices.co.uk:3000/d/rYdddlPWk/node-exporter-full?orgId=1&refresh=1m graph

After being left for a few days to run, we can now see this graph showing a problem with the connection.

MatthewCroughan commented 2 years ago

I've just set up Smokeping monitoring Cloudflare and Google, there's already some evidence of packet loss. So we'll see how it goes.

image

MatthewCroughan commented 2 years ago

The following is some data from Smokeping over a 10 day period. The anomaly on the 13th March is because I have a WWAN/4G connection on the machine which was preferred due to a routing error with Baltic, I believe. Shortly after their routing error was sorted out, my machine resumed using Baltic.

Pinging Google

image

Pinging Cloudflare

image

ajlennon commented 2 years ago

Quick update on this. I've been chatting to Matt Wilson @ Baltic Broadband who has been really helpful and put me onto Syd to talk about the issues we've been seeing.

Syd has explained to me the thinking is this is due to the way the bracket is constructed that holds the radio antenna which provides connectivity for DoES. In particular we've had all these bad storms which have affected the positioning and the chaps have had to come out to fix it.

So I'm told the plan is to regineer the bracket so it doesn't move about and this should address the issues we are seeing.

Great news and I will keep you posted when I know more...