OpenFogStack / celestial-videoconferencing-evaluation

GNU General Public License v3.0
2 stars 0 forks source link

no route to host #1

Closed ykxian closed 3 months ago

ykxian commented 4 months ago

Hi, @pfandzelter, I tried running this program on the latest version of celestial. I made some changes to the configuration file. I made some changes to the code and configuration files to adapt to the new version.

videoconference-satellite.toml

bbox = [-6.8596, -11.0020, 19.7872, 22.2767]

resolution = 5

duration = 600

[network_params]
bandwidth_kbits = 10_000
min_elevation = 40
ground_station_connection_type = "all"

[compute_params]
vcpu_count = 2
mem_size_mib = 512
kernel = "server-linux.bin"
rootfs = "server.img"
disk_size_mib = 50

[[shell]]
planes = 72
sats = 22
altitude_km = 550
inclination = 53.0
arc_of_ascending_nodes = 360.0
eccentricity = 0.0

[[shell]]
planes = 32
sats = 50
altitude_km = 1110
inclination = 53.8
arc_of_ascending_nodes = 360.0
eccentricity = 0.0

[[shell]]
planes = 8
sats = 50
altitude_km = 1130
inclination = 74.0
arc_of_ascending_nodes = 360.0
eccentricity = 0.0

[[shell]]
planes = 5
sats = 75
altitude_km = 1275
inclination = 81.0
arc_of_ascending_nodes = 360.0
eccentricity = 0.0

[[shell]]
planes = 6
sats = 75
altitude_km = 1325
inclination = 70.0
arc_of_ascending_nodes = 360.0
eccentricity = 0.0

[[ground_station]]
name = "tracker"
# Azure South Africa North (Johannesburg)
lat = -26.189948
long = 28.031616

[ground_station.compute_params]
vcpu_count = 4
mem_size_mib = 4096
kernel = "client-linux.bin"
rootfs = "tracker.img"

[[ground_station]]
#name = "Accra"
name = "1"
lat = 5.548854
long = -0.220214

[ground_station.compute_params]
vcpu_count = 4
mem_size_mib = 4096
kernel = "client-linux.bin"
rootfs = "client.img"
disk_size_mib = 500

[[ground_station]]
#name = "Abuja"
name = "2"
lat = 9.054770
long = 7.483895

[ground_station.compute_params]
vcpu_count = 4
mem_size_mib = 4096
kernel = "client-linux.bin"
rootfs = "client.img"
disk_size_mib = 500

[[ground_station]]
#name = "Yaounde"
name = "3"
lat = 3.872887
long = 11.520264

[ground_station.compute_params]
vcpu_count = 4
mem_size_mib = 4096
kernel = "client-linux.bin"
rootfs = "client.img"
disk_size_mib = 500

client.sh

echo "client"

echo "STARTING CLIENT"

IP=$(/sbin/ip route | awk '/default/ { print $3 }')

NAME=$(curl -s "$IP"/self | python3 -c 'import sys, json; print(json.load(sys.stdin)["identifier"]["name"])')

echo "$NAME"

cd ultra_ping || exit

./quack.py --listen_port 3000 --http_port 8000 --n_packets 210000 --send_rate_kBps 200 --id "$NAME" --workload_file /workload.csv --client &
./quack.py --listen_port 3000 --output_filename udp_packetn_latency_pairs --n_packets 420000 --timeout 120 --server

sleep 20

server.sh

GATEWAY_IP=$(/sbin/ip route | awk '/default/ { print $3 }')
MY_IP=$(/sbin/ip route | sed -n '2 p' | awk '{print $9}')

echo "server"
echo "$GATEWAY_IP"
echo "$MY_IP"

sed -i -e "s/%%%HOST%%%/$MY_IP/g" multiply.nft

ip link add name vethinj up type veth peer name vethgw
ip link set vethgw up

sysctl -w net.ipv4.conf.vethgw.forwarding=1
sysctl -w net.ipv4.conf.vethgw.accept_local=1
sysctl -w net.ipv4.conf.vethgw.rp_filter=0
sysctl -w net.ipv4.conf.all.rp_filter=0

ip route add $MY_IP/32 dev vethinj

nft -f multiply.nft

sleep 10

while true ; do
   sleep 10
done

tracker.sh

#!/bin/sh

echo "tracker"

IP=$(/sbin/ip route | awk '/default/ { print $3 }')

echo "STARTING TRACKER"

patchelf --set-interpreter /lib/ld-linux-x86-64.so.2 /tracker.bin
# glibc
/tracker.bin --update-interval=5 --gateway="$IP"

The main modifications I made to the code were to adjust the JSON structure to accommodate changes in the HTTP server. e.g. image

But when I began experimenting on 2 hosts, some issues arose during the process of the tracker ground station informing the client ground station. This is part of the output from the tracker ground station. image

Some suspicious circumstances. When the experiment starts, there's a long wait, and then suddenly a flood of messages appears, with negative waiting times. image I tried pinging other ground stations from one ground station and found that it was always unable to get through. Using HTTP path queries revealed that the path was consistently blocked. When I began experimenting on 3 hosts (the new one had Docker installed), the error messages are not quite the same.

DEBU[0586] Informing clients of satellite 1 42          
ERRO[0591] Post "http://1.gst.celestial:8000": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 
ERRO[0591] Post "http://1.gst.celestial:8000": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 
ERRO[0596] Post "http://2.gst.celestial:8000": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 
ERRO[0596] Post "http://2.gst.celestial:8000": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 
ERRO[0601] Post "http://3.gst.celestial:8000": dial tcp 10.0.0.14:8000: i/o timeout (Client.Timeout exceeded while awaiting headers) 
ERRO[0601] Post "http://3.gst.celestial:8000": dial tcp 10.0.0.14:8000: i/o timeout (Client.Timeout exceeded while awaiting headers) 

When the experiment ends, CTRL+C on the host does not exit to the normal terminal interface; a new terminal must be restarted. image

I am sorry to disturb you again. Could you give me some advice? Thanks for your help.

ykxian commented 4 months ago

gst-1 -> sat 1-109 image

sat 1-109 -> gst-tracker image Of course, I can also ping gst-tracker from 1-109. image

but gst-1 ->gst-tracker image

ip route on gst-1 image

A->B √ B->C √ A->C × I've been troubled by this question for a long time.

pfandzelter commented 4 months ago

Hi @ykxian I have seen this issue and will look into it when I have time. In the meantime, can you confirm for me that none of the hosts A, B, and C have Docker installed? If Docker was ever installed in the host system, unfortunately our network settings cannot be applied correctly (something Docker changes in the host systems that remains even after uninstallation). I also have not updated the videoconferencing example for Celestial v2 yet, so I will have to check it out

ykxian commented 4 months ago

Hi @pfandzelter I have 3 PC and a virtual machine y. image PC7 and PC8 have never had Docker installed. I use PC7, PC8 and y(coordinator) for experiments.

pfandzelter commented 3 months ago

Hi @ykxian, I have not completely replicated the problem on my end but two comments already:

  1. the issue of Ctrl + C not ending the Celestial process completely should now be fixed -- there was a deadlock that should be removed now. Celestial will also close at the end of the experiment automatically (in the newest commit on the main celestial repository)
  2. "When the experiment starts, there's a long wait, and then suddenly a flood of messages appears, with negative waiting times." --> this is caused by the initial setup, which takes a bit. You will see in the log messages that the first update takes 16 seconds, which delays the second and third update. That is unfortunately expected. As long as the first update has not been completed, the machines can also not communicate

I will look into other bugs soon

ykxian commented 3 months ago

Thanks for your help!

ykxian commented 3 months ago

After updating the code and running it again, it now terminates correctly. image

pfandzelter commented 3 months ago

Hi @ykxian, first of all: sorry for the delay in responding to this! Second, thank you for raising this issue. I updated the celestial-videoconferencing-evaluation for Celestial v2 and tried to replicate this. I found two issues in Celestial:

  1. That Celestial would not quit correctly (described above)
  2. There was a small but significant bug in the simulation of communication between ground stations. That lead to the issue you saw (and that I was able to replicate) where the ground stations showed no route to host. There were even small bugs in the tests that prevented me from seeing this earlier.

I have fixed the second issue in https://github.com/OpenFogStack/celestial/commit/e2f8746340164b7934f851b3e9648ef013e5d4df. Now everything worked as it should. I have also updated this repository, please check if this fits with your modifications as well and try out if the issue is resolved.

Best regards, Tobias

ykxian commented 3 months ago

Hi @pfandzelter, thank you very much for your help all this time! The communication between ground stations has been correct now. However, it seems there are some issues with your modifications to the repository. Below are my modifications. apiutils.go image image image tracker.go image image The problem seems to be related to the JSON and shell ID.

pfandzelter commented 3 months ago

You're right, I have adapted this in 2d7247b. Thank you!