Closed mounika-alavala closed 1 year ago
Hi @mounika-alavala, thanks for opening an issue. Would you mind providing the output of juju debug-log --replay
, and since when do you have it deployed?
Hi Thanks for the reply. We have the setup since 5 days. Attached are the debug logs of orchestrator. juju_replay.txt
There may be multiple issues here:
@mounika-alavala Do you mind connecting to one of the Kubernetes pods that is associated to a Maintenance
status application and telling me if the workload service is running?
Example for orc8r-device
kubectl exec -ti orc8r-device-0 -c magma-orc8r-device -n <your model name> -- bash
ps -ef
I don't think that your bug is related to this but I also observed a bunch of error logs that shouldn't be there. Here's the PR to fix this.
Also, I see that the output of the debug-log
command is cut short. Is there a way to get all the logs from the deployment? We may want to filter on errors: juju debug-log --replay --level ERROR
.
There may be multiple issues here:
- The fact that the orchestrator application go from active to maintenance
- Connectivity between AGW and Orchestrator / Status of AGW
@mounika-alavala Do you mind connecting to one of the Kubernetes pods that is associated to a
Maintenance
status application and telling me if the workload service is running?Example for
orc8r-device
kubectl exec -ti orc8r-device-0 -c magma-orc8r-device -n <your model name> -- bash ps -ef
Hi @gruyaume , I am a colleague of @mounika-alavala . Please find attached a screenshot showing connect with the pods in maintenace state.
Hi Thanks for reply. Attached is the error log. error.txt
Hi @mounika-alavala, first of all, thank you for providing the logs. I have analyzed them and did not find any clues as to what is happening with the AGW. Can you please share the configuration that was used to deploy the AGW, and screenshots of its configuration inside the NMS?
Also, can you run this command and post its output while on the Juju model for the AGW: juju run magma-access-gateway-operator/0 post-install-checks
.
Thanks!
Hi Thanks for the reply. Attached are the screenshots of the information you requested. Please note that we deployed all these on top of Openstack VMs behind proxy.
I do not think the SGi
addresses should be in the same range as the IP Block
. The IP Block
is used to give out addresses to the UEs
.
I think in this version of Magma, the UI to configure the EPC is a bit difficult to follow. Here is what I think you should configure:
IP Block
: Private range of IPs that will be used by UE
DNS Primary & Secondary
: DNS servers that can be reached by the AGW through eth0
SGi network Gateway IP address
: The default gateway that the AGW will use on eth0
SGI management interface IP address
: IP and netmask configured on eth0
I will look into improving the documentation for this, and maybe see if we can change this page in charmed-magma
.
Hi Thanks for the reply. Even when IP block and SGi address are on different range it still gave out the same error. DNS server can be reached through eth0. We did set SGi network gateway and management IP correctly. We are sure about this because we have setup magma on baremetal servers before. Now we are trying on top of Openstack VMs with proxies.
OK, in that case, are you able to take a network capture of the traffic between the AGW and ORC8R? Nothing in the logs you provided indicates any issues. Ideally, we would require a capture taken on AGW and another on the ORC8R, but if you are only able to get one, let us start with the AGW.
Attached are the network captures. No traffic via eth0 in AGW. tcpdump_agw_eth1_16feb_23.txt tcpdump_orc8r_16fec23.txt
Thank you very much for providing those. While looking into the AGW capture, I found the issue is related to the proxy. The proxy is giving out errors like these: <p>The following error was encountered while trying to retrieve the URL: <a href="https://bootstrapper-controller.5gmagmatest.com/*">https://bootstrapper-controller.5gmagmatest.com/*</a></p>
.
My guess is that the proxy does not know how to resolve the domain name 5gmagmatest.com
. I am not sure how much control you have on that proxy, but a solution could be to add the domains to its hosts file. If this is not possible, I would suggest using a domain name backed by whatever DNS server the proxy is using.
Let me know if this helps.
We already added the domains to the hosts file. Using telnet we also verified the access and it was able to resolve the domain name.
When using telnet, it will bypass the proxy. You could try something like this to test with the proxy:
export https_proxy="<proxy url>"
curl https://bootstrapper-controller.5gmagmatest.com/
In this particular case, you could not use the proxy at all and it should work. However, if your goal is to test through the proxy, the proxy server itself needs to be able to resolve those domain names and connect to those ports.
Hi We have tested using curl too. We actually added the domain (5gmagmatest.com) to “no_proxy”.
If you add the domain to no_proxy before testing with curl, it will only test the direct connection. I am curious how the AGW is trying to connect through the proxy, as the capture clearly shows. Did you configure the proxy anywhere on the VM?
Yes we have configured proxy on the VM. Proxies are set in : .bashrc, /etc/environment, /etc/wgetrc
OK, so the issue is that the proxy server does not know 5gmagmatest.com
and cannot forward traffic to it. You can either disable the proxy globally on the VM, or go in the proxy server itself and configure the hosts file and network so that it can resolve and connect to 5gmagmatest.com
.
We tested using curl too.
Right now, your test with curl
set the no_proxy
variable, so they only show direct connectivity. The Magma AGW is taking the proxy setting that is configured globally however, and it sends its traffic through the proxy server.
You have 2 options to fix the issue:
Hi We set proxies and no proxy in VM like below. We added these proxies in ".bashrc, /etc/environment, /etc/wgetrc". We also made a host entry in "/etc/hosts" As mentioned above, request to orchestrator services from AGW, are not going via proxy, they are connecting directly.
Unfortunately, it seems that AGW is not taking no_proxy
into account, because the capture that you provided shows traffic going through the proxy.
Hi Can you guide us on how to set no_proxy variables so that AGW picks it up?
Hi, can you try running:
sudo snap unset system proxy.http
sudo snap unset system proxy.https
sudo service magma@* stop
sudo service magma@magmad restart
I have looked at snap settings, and there does not seem to be an option for setting no_proxy
there. If this works, I can raise an issue to add this feature.
If it does not work, please share the exact content of /etc/environment
.
Hi, Thanks, I tried running the commands you had mentioned but agw is still not picking up the no-proxy settings. please find attached the exact content of /etc/environment
Where is the variable $PROXY_NO
defined? It is not in /etc/environment
, so that might be the issue. I see from previous comments that is is generated dynamically in a subshell, so my guess would be that this is only in .bashrc
or similar, and this will not apply to AGW.
Please add define no_proxy
directly in /etc/environment
.
oh okay. You are right. I have defined no_proxy in .bashrc. Let me try adding it directly to /etc/environment
Thanks, After setting no_proxy in /etc/environment and restarting magmad services on agw, I am able to see that gateway.crt and gateway.key files have been successfully generated under /var/opt/magma/certs directory.
But I am still seeing magmad errors in the logs and also agw has still not checked-in with the orc8r.
I ran checkin_cli.py to debug the issue : have attached the screenshot below.
Can you help me with what I am missing here?
Unfortunately, those logs are not really telling much. Can you provide a network capture on the AGW like what was provided before?
Sorry. Please find attached tcpdump from agw: tcpdump_agw_eth1_17feb_night.log
Hope this helps.
I am still seeing agw trying to go via proxy to connect to 5gmagmatest.com from the tcpdump from agw.
There is no indication in the documentation for upstream Magma 1.6 that installing the AGW behind an HTTP proxy is supported. I would like to better understand what you are trying to achieve with this setup.
I think removing the global configuration and testing that way would be the best way forward. Afterwards, if UEs going through the AGW need to go through an HTTP proxy, I think the Header Enrichment
feature can be used for similar reasons.
Since the AGW behaves like a router, traffic from the UE would not go directly through the proxy with this setup even if we were able to make it check in to the orc8r. The proxy configuration would need to be done on the UE.
Hi, We are trying to setup a private 5g cloud on MAAS/openstack VMs. Openstack VMs do not have direct internet access for security reasons. So they have to use the MAAS proxy only.
For agw to communicate with orc8r services, so we have to make sure that all the orc8r services/nodes are part of the no-proxy list in /etc/environment and .bashrc files. We are still trying to figure out why agw is not picking up the no-proxy settings from /etc/environment .
agw gets its proxy settings from /etc/environment, why is agw not picking up the no-proxy settings ? That seems to be root cause of this problem. If agw can pick up the no-proxy settings, it would skip the proxy and connect directly to the orc8r services.
And magma-access-gateway.configure ran successfully and generated gateway.crt and gateway.key files. For this, I assume agw should be able to access bootstrapper-controller.5gmagmatest.com bypassing the proxy server . So if agw is able to access one serivce, namely bootstrapper, why is it not doing the same with the other services (controller, fluentd)?
Hi, magma-access-gateway.configure
does not require access to bootstrapper-controller
, it basically creates the configuration files and places the certificates in the right place.
One thing we can do to validate is try to see what Magma sees for proxy configuration. You can find the PID of the MME service:
systemctl status magma@mme.service
The main PID will be in the output. You can then use the PID in this command:
cat /proc/<PID>/environ
With this, we will be at least able to see the environment view from the process point of view.
Hi It is having "no_proxy" values.
I will try to replicate the setup locally to see if I can reproduce the issue and debug further.
Ok Thank you.
Hi @ghislainbourgeois , Thanks a lot! As suggested by you, we removed all the proxy variables from /etc/environment after install and it worked! Agw is now able to check-in to the orc8r.
I am glad that it worked. On my side, I recreated a similar setup, and was able to reproduce the behaviour. The AGW gateway is able to bootstrap, but does not check-in properly afterwards and never shows up as Good
in the Orchestrator.
I will use this setup to dig a bit deeper and understand why it only partly works.
In your setup, do you connect an enodeB and some UEs for testing? Does the current setup let you do anything useful with the AGW?
Hi We setup SRSRAN on another Openstack VM. It did setup gtp tunnel, but it is not functional. Uplink and downlink are not working. We did a few checks like:
That is weird, I would expect ping
to work in this case, but not much else. Can you provide the output of the following command on AGW and also a network capture on AGW:
ip -br -c a
ip route get 192.168.128.18
tcpdump -i any -s0 -w agw.pcap icmp
On my side, I have made some progress regarding the proxy setup. It turns out that magmad
takes the settings properly, but control-proxy
, running nghttpx is configured with the proxy, but unfortunately does not support no_proxy
. Its configuration gets created automatically on each magmad
startup, so there is no easy fix there.
I thus think that the official recommendation when behind a proxy is to ensure that no proxy is configured in /etc/environment
, and configured directly for other applications requiring proxy (ie. apt
for package upgrades).
Attached are the details you asked for. Trace from srsenb while running ping from both UL and DL directions. UE attach successful UE IP assigned in namespace ue1
Note: agw.pcp renamed to agw.log because *.pcap not supported agw.log
I think in this case the problem is in the networking setup of the AGW. The traffic from the UE should come and go through the virtual network interface gtp_br0
, and not directly out on the S1 interface (eth1
).
I think the problem is that the IP Block configured in the orchestrator for this AGW is wrong. It should probably be 192.168.128.0/24
. Can you try changing this setting, restarting the AGW and retrying again with srsRAN?
Hi Even when IP block is 192.168.128.0/24, issue still remains the same.
I think you mentioned that you already set block_agw_local_ips=false
in /etc/magma/pipelined.yml
, right? If it is not, please set this and restart the magma@pipelined
service.
Can you also provide a network capture directly on the gtp_br0
interface when the IP block is set to 192.168.128.0/24
? It will help narrow down the problem.
Yes we did set it to false.
We captured only for a little while. Now its assigning only in 192.168.30 subnet.
1 0.000000 192.168.30.128 → 10.250.110.36 ICMP 100 Echo (ping) request id=0x2a2c, seq=1/256, ttl=64 2 0.003793 10.250.110.36 → 192.168.30.128 ICMP 100 Echo (ping) reply id=0x2a2c, seq=1/256, ttl=63 (request in 1) 3 59.794959 192.168.128.1 → 192.168.30.22 ICMP 100 Echo (ping) request id=0x2a2d, seq=1/256, ttl=64 4 60.823803 192.168.128.1 → 192.168.30.22 ICMP 100 Echo (ping) request id=0x2a2d, seq=2/512, ttl=64 5 61.043090 192.168.30.128 → 10.250.110.36 ICMP 100 Echo (ping) request id=0x2a2e, seq=1/256, ttl=64 6 61.047283 10.250.110.36 → 192.168.30.128 ICMP 100 Echo (ping) reply id=0x2a2e, seq=1/256, ttl=63 (request in 5) 7 61.847800 192.168.128.1 → 192.168.30.22 ICMP 100 Echo (ping) request id=0x2a2d, seq=3/768, ttl=64 8 62.871741 192.168.128.1 → 192.168.30.22 ICMP 100 Echo (ping) request id=0x2a2d, seq=4/1024, ttl=64 9 63.899727 192.168.128.1 → 192.168.30.22 ICMP 100 Echo (ping) request id=0x2a2d, seq=5/1280, ttl=64 10 122.085721 192.168.30.128 → 10.250.110.36 ICMP 100 Echo (ping) request id=0x2a2f, seq=1/256, ttl=64 11 122.089748 10.250.110.36 → 192.168.30.128 ICMP 100 Echo (ping) reply id=0x2a2f, seq=1/256, ttl=63 (request in 10) 12 183.125904 192.168.30.128 → 10.250.110.36 ICMP 100 Echo (ping) request id=0x2a30, seq=1/256, ttl=64 13 183.130500 10.250.110.36 → 192.168.30.128 ICMP 100 Echo (ping) reply id=0x2a30, seq=1/256, ttl=63 (request in 12) 14 244.166527 192.168.30.128 → 10.250.110.36 ICMP 100 Echo (ping) request id=0x2a31, seq=1/256, ttl=64 15 244.171539 10.250.110.36 → 192.168.30.128 ICMP 100 Echo (ping) reply id=0x2a31, seq=1/256, ttl=63 (request in 14) 16 254.801842 10.250.110.104 → 192.168.30.128 ICMP 104 Destination unreachable (Network unreachable) 17 254.803360 10.250.110.104 → 192.168.30.128 ICMP 104 Destination unreachable (Network unreachable) 18 257.848888 10.250.110.104 → 192.168.30.128 ICMP 104 Destination unreachable (Network unreachable) 19 260.888801 10.250.110.104 → 192.168.30.128 ICMP 104 Destination unreachable (Network unreachable)
I still think the network setup is not correct. I think you should have 192.168.128.0/24
setup in the IP Block setting, then:
# Stop UE and enodeB
systemctl stop magma@*
systemctl start magma@magmad
# Start enodeB
# Start UE
Then, the UE should get an IP in the range 192.168.128.0/24
. You should be able to ping the AGW from the UE with this command: ping 192.168.128.1
. And you should be able to ping the UE from the AGW using this command: ping 192.168.128.x
(replace x with the right number from the UE attach message).
In your last messages, you seem to be pinging between different networks.
Thanks @ghislainbourgeois . We are able to ping the agw from UE and vice versa :-)
Hi, We have installed charmed magma orchestrator and AGW services on Openstack VMs with Ubuntu 20.04 OS, running behind proxies.
Orchestrator: microk8s version = 1.23 AGW: Version = 1.6.1
We are able to access NMS UI and all the AGW services are in active state. There are error logs in AGW services. Attached are the screenshots of same.
Even though orchestrator services come to "Active" and "Idle" state after installation, they tend to go to "maintenance" state after a day or so. They remain in that state. Even though same proxy values are used, not always same services go to "maintenance" state.
End points of orchestrator are accessible from AGW, we used telnet to check the same.
As part of debugging section of documentation, we ran few python scripts to confirm if every prerequisite is satisfied. When we executed "checkin_cli.py" script, we found out that gateway certificate and gateway key are missing. Restarting magma services didn't help in regenerating the certificate and key.
We tried checking AGW to orchestrator with correct hardware details. But it's not checking in and the status in NMS UI is "Bad".
Any help will be appreciated. Thanks in advance.