Open Cosmos-Break opened 2 years ago
i have the same problemm too. i have tested tcp and udp connection and problem is still.
Can you check if umdns is working correctly on all aps?
@PolynomialDivision would each AP device see all APs ? my casue:
AP1-------AP2----internet->
AP2 see AP1 but AP1 can't find AP2
and I notice that AP1 make a tcp connection to AP2 on port 1026 but AP2 make no connection to AP1
all my AP setup on dawn is
config network
option broadcast_port '1025'
option tcp_port '1026'
option network_option '2'
option shared_key 'Niiiiiiiiiiiiick'
option iv 'Niiiiiiiiiiiiick'
option use_symm_enc '0'
option collision_domain '-1'
option bandwidth '-1'
option broadcast_ip '192.168.16.255'
I think my case is the same issue with this one.
@PolynomialDivision would each AP device see all APs ?
Yes. Sounds like a umdns issue. Can you check:
ubus call umdns browse
on AP1: ubus call umdns browse
{
..........
"_dawn._tcp": {
"X-WRT": {
"port": 1026
}
},
......
}
on AP2
.........
"_dawn._tcp": {
"X-WRT": {
"ipv4": "192.168.16.118",
"ipv6": "fe80::e667:1eff:fe29:68fe",
"port": 1026
}
},
.........
The entry on ap1 is wrong. It should contain an IP like on AP2. DAWN does not know where to connect to. Can you restart umdns on both?
The entry on ap1 is wrong. It should contain an IP like on AP2. DAWN does not know where to connect to. Can you restart umdns on both?
true
now i notice that it is related to umdns config:
wrong config:
config umdns
option jail '1'
list network 'lan'
list network 'meshx0'
good config:
config umdns
option jail '1'
list network 'meshx0'
but I think config 1 should work, but not. @PolynomialDivision
but I think config 1 should work, but not.
Do you have the "lan" network available? I also had issues when using a network that is not configured on the ap.
but I think config 1 should work, but not.
Do you have the "lan" network available? I also had issues when using a network that is not configured on the ap.
no, lan
is not started on boot, so it is related to this?
could umdns tolerate such config?
I think it should.
no,
lan
is not started on boot, so it is related to this? could umdns tolerate such config?
Maybe @blogic can say something to that?
@PolynomialDivision there are many case to cause this issue if I restart AP1 this issue happend again. this should not.
also /etc/init.d/umdns restart
on AP1 also cause issue
that would lead to AP1 ubus call umdns browse
get _dawn.tcp with no ip
on AP1, call /etc/init.d/umdns restart, or reboot AP1
on AP2, ubus call umdns browse get:
...
"_dawn._tcp": {
"X-WRT": {
"port": 1026
}
},
...
on AP2, call /etc/init.d/umdns reload, then it get the ip:
...
"_dawn._tcp": {
"X-WRT": {
"ipv4": "xxxxx"
"port": 1026
}
},
...
what a magic issue? @PolynomialDivision @blogic
@PolynomialDivision
for now I hack an workaroud to fixup this issue:
if dawn found that there is no ipv4, it trigger /etc/init.d/umdns reload
to fixup it.
diff --git a/src/utils/ubus.c b/src/utils/ubus.c
index 995b2f8..14d52c1 100644
--- a/src/utils/ubus.c
+++ b/src/utils/ubus.c
@@ -1310,6 +1310,8 @@ int wnm_disassoc_imminent(uint32_t id, const struct dawn_mac client_addr, struct
return 0;
}
+static int umdns_need_reload = 0;
+
static void ubus_umdns_cb(struct ubus_request *req, int type, struct blob_attr *msg) {
struct blob_attr *tb[__DAWN_UMDNS_TABLE_MAX];
@@ -1341,7 +1343,10 @@ static void ubus_umdns_cb(struct ubus_request *req, int type, struct blob_attr *
dawnlog_debug("IPV4: %s\n", blobmsg_get_string(tb_dawn[DAWN_UMDNS_IPV4]));
dawnlog_debug("Port: %d\n", blobmsg_get_u32(tb_dawn[DAWN_UMDNS_PORT]));
} else {
- return; // TODO: We're in a loop. Should this be return or continue?
+ if (!tb_dawn[DAWN_UMDNS_IPV4] && tb_dawn[DAWN_UMDNS_PORT]) {
+ umdns_need_reload = 1;
+ }
+ continue;
}
add_tcp_connection(blobmsg_get_string(tb_dawn[DAWN_UMDNS_IPV4]), blobmsg_get_u32(tb_dawn[DAWN_UMDNS_PORT]));
}
@@ -1366,6 +1371,11 @@ int ubus_call_umdns() {
blob_buf_free(&b);
dawn_unregmem(&b);
+ if (umdns_need_reload == 1) {
+ umdns_need_reload = 0;
+ system("/etc/init.d/umdns reload");
+ }
+
return 0;
}
on AP2:
root@X-WRT:~# lsof -ni :1026
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dawn 6858 root 8u IPv4 993817 0t0 TCP *:1026 (LISTEN)
dawn 6858 root 9u IPv4 1143895 0t0 TCP 192.168.16.1:40802->192.168.16.118:1026 (ESTABLISHED)
dawn 6858 root 10u IPv4 995271 0t0 TCP 192.168.16.1:1026->192.168.16.118:41608 (ESTABLISHED)
dawn 6858 root 11u IPv4 1078948 0t0 TCP 192.168.16.1:1026->192.168.16.118:53928 (ESTABLISHED)
dawn 6858 root 12u IPv4 1123768 0t0 TCP 192.168.16.1:1026->192.168.16.118:49606 (ESTABLISHED)
but there is new issue, old dawn tcp connections not close?
on AP2, I see there are 3 connections come in from AP1 but only 1 connection out from AP1
on AP1 (just reboot): there is 2 connection total, one in and one out, looks good.
@PolynomialDivision
this line, ustream_write
just add data to write buff
for most case, tcp connection buff is large, and it take long to reach full and never fail before full.
this is the cause for tcp connection could never close
case 1:
AP1 reboot
AP2 keep the tcp connection to AP1, and never close,
and dawn just keep push data to buff via ustream_write()
case 2:
AP1 reboot
AP2 keep the connection from AP1
again AP1 up and new connection to AP2, then reboot AP1
then, AP2 keep this new connection from AP1
repeat reboot AP1
in AP2
more and more connections keep since AP1 reboot times.
to fix this, I hack again:
diff --git a/src/network/tcpsocket.c b/src/network/tcpsocket.c
index 91bc452..0abcfd7 100644
--- a/src/network/tcpsocket.c
+++ b/src/network/tcpsocket.c
@@ -364,12 +364,15 @@ void send_tcp(char *msg) {
list_for_each_entry_safe(con, tmp, &tcp_sock_list, list)
{
if (con->connected) {
+ int need_close = 0;
int len_ustream = ustream_write(&con->stream.stream, final_str, final_len, 0);
dawnlog_debug("Ustream send: %d\n", len_ustream);
- if (len_ustream <= 0) {
+ if (ustream_pending_data(&con->stream.stream, true) >= 24*1024 + final_len /* 24K buffered data is enough? */)
+ need_close = 1;
+ if (len_ustream <= 0 || need_close) {
dawnlog_error("Ustream error(" STR_QUOTE(__LINE__) ")!\n");
//ERROR HANDLING!
- if (con->stream.stream.write_error) {
+ if (con->stream.stream.write_error || need_close) {
ustream_free(&con->stream.stream);
dawn_unregmem(&con->stream.stream);
close(con->fd.fd);
@@ -401,12 +404,15 @@ void send_tcp(char *msg) {
list_for_each_entry_safe(con, tmp, &tcp_sock_list, list)
{
if (con->connected) {
+ int need_close = 0;
int len_ustream = ustream_write(&con->stream.stream, final_str, final_len, 0);
dawnlog_debug("Ustream send: %d\n", len_ustream);
- if (len_ustream <= 0) {
+ if (ustream_pending_data(&con->stream.stream, true) > 24*1024 + final_len /* 24K buffered data is enough? */)
+ need_close = 1;
+ if (len_ustream <= 0 || need_close) {
//ERROR HANDLING!
dawnlog_error("Ustream error(" STR_QUOTE(__LINE__) ")!\n");
- if (con->stream.stream.write_error) {
+ if (con->stream.stream.write_error || need_close) {
ustream_free(&con->stream.stream);
dawn_unregmem(&con->stream.stream);
close(con->fd.fd);
@ynezz Is there an option for ustream to handle such situations?
I also have add patch to close timeout connection https://github.com/ptpt52/dawn/commit/b195cfab7d48c52d27b31fe36238ccadd824c66e
i had a similar issue. I am running 2 routers, a TP Link Archer A7 and Xiaomi Wifi Mini. They are connected together via 802.11s mesh. The Xiaomi one is connected to internet via 5Ghz Wifi Client Mode.
I had enabled STP on br-lan interface previously, thinking it would help resolve any kind of loop forming in mesh situations, but that was the issue. After disbaling STP, both my routers can see umdns advertised dawn, smb, ssh, and other stuff in "ubus call umdns browse" and it's all working perfectly now. Also, one more thing, in /etc/avahi/avahi-daemon.conf i have set enable-reflector=yes which was no by default. I don't know if this helped, but it's all working now.
Hope this helps someone.
I have three APs, Xiaomi AX6S, Phicomm K2P and NewWiFi D2. The OpenWRT version is 22.03.0-rc1 (On all three devices).
AX6S can find all three APs, D2 finds two APs, K2P only finds one AP.
AX6S:
D2:
K2P:
This is not a LuCI problem, because 'ubus call dawn get_network' returns the same result.