AllStarLink / app_rpt

Refactoring and upgrade of AllStarLink's app_rpt, etc.
9 stars 9 forks source link

DTMF-Based Linking + DNS Node Lookups Fail Due to Timer Issues #392

Open dbehnke opened 3 months ago

dbehnke commented 3 months ago

Describe the bug When I attempt to do *3 + >4 digit node number, linking does not work.. digits after 4 get cut off

To Reproduce Steps to reproduce the behavior:

  1. monitor log /var/log/asterisk/messages
  2. dial *3 ######
  3. notice in log only *3 #### get read. last ## don't get acknowledged
  4. repeat the steps with a 4 digit node number, such as 2560, link works.
  5. using a macro such as 51 and configuring macro to 803##### or similar does not work.
  6. using rpt cmd [mynode] ilink 6 ##### works.

Expected behavior I can dial node numbers > 4 digits

sample log This is me trying to connect to 594950 using dtmf, notice only 5949 gets picked up

[2024-08-23 11:42:13.480] DTMF[1252] channel.c: DTMF begin '*' received on SimpleUSB/550464
[2024-08-23 11:42:13.480] DTMF[1252] channel.c: DTMF begin passthrough '*' on SimpleUSB/550464
[2024-08-23 11:42:13.928] DTMF[1252] channel.c: DTMF end '*' received on SimpleUSB/550464, duration 447 ms
[2024-08-23 11:42:13.928] DTMF[1252] channel.c: DTMF end accepted with begin '*' on SimpleUSB/550464
[2024-08-23 11:42:13.928] DTMF[1252] channel.c: DTMF end passthrough '*' on SimpleUSB/550464
[2024-08-23 11:42:14.398] DTMF[1252] channel.c: DTMF begin '3' received on SimpleUSB/550464
[2024-08-23 11:42:14.398] DTMF[1252] channel.c: DTMF begin passthrough '3' on SimpleUSB/550464
[2024-08-23 11:42:14.760] DTMF[1252] channel.c: DTMF end '3' received on SimpleUSB/550464, duration 362 ms
[2024-08-23 11:42:14.760] DTMF[1252] channel.c: DTMF end accepted with begin '3' on SimpleUSB/550464
[2024-08-23 11:42:14.760] DTMF[1252] channel.c: DTMF end passthrough '3' on SimpleUSB/550464
[2024-08-23 11:42:15.123] DTMF[1252] channel.c: DTMF begin '5' received on SimpleUSB/550464
[2024-08-23 11:42:15.123] DTMF[1252] channel.c: DTMF begin passthrough '5' on SimpleUSB/550464
[2024-08-23 11:42:15.422] DTMF[1252] channel.c: DTMF end '5' received on SimpleUSB/550464, duration 298 ms
[2024-08-23 11:42:15.422] DTMF[1252] channel.c: DTMF end accepted with begin '5' on SimpleUSB/550464
[2024-08-23 11:42:15.422] DTMF[1252] channel.c: DTMF end passthrough '5' on SimpleUSB/550464
[2024-08-23 11:42:15.806] DTMF[1252] channel.c: DTMF begin '9' received on SimpleUSB/550464
[2024-08-23 11:42:15.806] DTMF[1252] channel.c: DTMF begin passthrough '9' on SimpleUSB/550464
[2024-08-23 11:42:16.020] DTMF[1252] channel.c: DTMF end '9' received on SimpleUSB/550464, duration 213 ms
[2024-08-23 11:42:16.020] DTMF[1252] channel.c: DTMF end accepted with begin '9' on SimpleUSB/550464
[2024-08-23 11:42:16.020] DTMF[1252] channel.c: DTMF end passthrough '9' on SimpleUSB/550464
[2024-08-23 11:42:16.445] DTMF[1252] channel.c: DTMF begin '4' received on SimpleUSB/550464
[2024-08-23 11:42:16.445] DTMF[1252] channel.c: DTMF begin passthrough '4' on SimpleUSB/550464
[2024-08-23 11:42:16.701] DTMF[1252] channel.c: DTMF end '4' received on SimpleUSB/550464, duration 255 ms
[2024-08-23 11:42:16.701] DTMF[1252] channel.c: DTMF end accepted with begin '4' on SimpleUSB/550464
[2024-08-23 11:42:16.701] DTMF[1252] channel.c: DTMF end passthrough '4' on SimpleUSB/550464
[2024-08-23 11:42:17.043] DTMF[1252] channel.c: DTMF begin '9' received on SimpleUSB/550464
[2024-08-23 11:42:17.043] DTMF[1252] channel.c: DTMF begin passthrough '9' on SimpleUSB/550464
[2024-08-23 11:42:17.299] DTMF[1252] channel.c: DTMF end '9' received on SimpleUSB/550464, duration 255 ms
[2024-08-23 11:42:17.299] DTMF[1252] channel.c: DTMF end accepted with begin '9' on SimpleUSB/550464
[2024-08-23 11:42:17.299] DTMF[1252] channel.c: DTMF end passthrough '9' on SimpleUSB/550464
Allan-N commented 3 months ago

What version of ASL3 do you have installed? If you are running a version prior to v3.0.4 then you are missing the fix for https://github.com/AllStarLink/app_rpt/pull/372.

Also, we recently had another reported issue, https://github.com/AllStarLink/app_rpt/issues/386, that turned out to be a network configuration issue.

From the node, do you have reasonable network connectivity? Can you resolve/ping hostnames (e.g. ping register.allstarlink.org)?

dbehnke commented 3 months ago

I'm checking the notes from the previous ticket there to see if this applys here.. in the meantime here's ping results..

Version is Asterisk 20.9.1+asl3-3.0.4-1.deb12

ping register.allstarlink.org
PING register.allstarlink.org (162.248.92.131) 56(84) bytes of data.
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=1 ttl=56 time=30.2 ms
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=2 ttl=56 time=25.6 ms
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=3 ttl=56 time=26.2 ms
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=4 ttl=56 time=25.9 ms
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=5 ttl=56 time=22.0 ms
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=6 ttl=56 time=32.6 ms
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=7 ttl=56 time=34.9 ms
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=8 ttl=56 time=28.4 ms
64 bytes from v-162-248-92-131.unman-vds.premium-chicago.nfoservers.com (162.248.92.131): icmp_seq=9 ttl=56 time=26.7 ms
dbehnke commented 3 months ago

It have determined this issue is with Tailscale and not Allstar. Tailscale is doing odd things with its internal DNS which is supposed to passthrough to a provider of your choice.. i.e. 8.8.8.8

I still don't quite understand why that would cut the numbers off at 4 digits.. but there we are.

The solution for me was to just disable Tailscale's DNS and use 8.8.8.8 directly on the OS.

For future viewers that use tailscale...

sudo tailscale up --accept-dns=false --accept-routes

If you want you can leave the issue open for a day or two in case anyone else has comments/suggestions..

Allan-N commented 3 months ago

Great to hear that you have solved the problem.

Allan-N commented 3 months ago

Dave,

Clearly, the simple "can we resolve "a" DNS name was not a good "quick" DNS test. How did you determine that the issue was with Tailscale's DNS? Is there a good test we could add (or at least document) that would help identify this issue?

dbehnke commented 3 months ago

So for tailscale.. they have something called tailnet.. that allows you to do dns lookups to your other nodes. and when you enable tailscale with tailscale up it will change your dns server to 100.100.100.100 which is tailscale's internal DNS. I noticed this by looking at my /etc/resolv.conf that it was set to that. If I temporily changed it to 8.8.8.8 in the /etc/resolv.conf, I was then able to connect to allstar nodes. So what I think happens when you have it set to 100.100.100.100 it tries to then bounce to your internal address .. in my case 192.168.0.53 is my dns locally. and it can't do that from its point of view in the cloud. One thing I didn't try (in the screenshot below) is to flip the switch to override local DNS... i'm wondering if that would also work.

image
jxmx commented 3 months ago

This is likely related to a know DNS services defect on the infra side. Try not using the intercepted DNS.

dbehnke commented 3 months ago

Yeah I've kinda given up for now with this .. even with tailscale on and disabling intercepting DNS (using 8.8.8.8) it still has an issue. The only way i was able to work around the issue for now is to use FILE mode in rpt.conf (turning off DNS lookup) or turn off tailscale and then DNS works.

I use tailscale as a vpn between my nodes and as an exit node for when i'm on cellular to appear with the same ip. (ATT likes to change IPs unfortunately). Not sure what's changed between first release and 3.0.4 but something broke using tailscale for me.

KB4MDD commented 3 months ago

@dbehnke Would you be willing to setup your system with the failing configuration for me to get some debugging information. We need to at least understand why this causes the failure.

If you are willing have some questions and steps.

KB4MDD commented 3 months ago

Here is some debugging information from my computer. I was able to get the failure as soon as I started asterisk. I will note that I enabled res_resolver_unbounded.so and set the nameserver to 1.1.1.1.

k4mdddevelop*CLI> rpt fun 2231 *353431
[2024-08-26 15:22:19.543] DEBUG[43943]: app_rpt/rpt_telemetry.c:381 cancel_pfxtone: cancel_pfxfone!!
[2024-08-26 15:22:19.543] DEBUG[43943]: app_rpt.c:1435 collect_function_digits: digits=3  source=0
[2024-08-26 15:22:19.543] DEBUG[43943]: app_rpt.c:1487 collect_function_digits: @@@@ action: ilink, param = 3
[2024-08-26 15:22:19.543] DEBUG[43943]: app_rpt.c:1493 collect_function_digits: @@@@ table index i = 3
[2024-08-26 15:22:19.543] DEBUG[43943]: app_rpt/rpt_functions.c:69 function_ilink: @@@@ ilink param = 3, digitbuf =
[2024-08-26 15:22:19.543] DEBUG[43943]: app_rpt.c:1505 collect_function_digits: rv=0
[2024-08-26 15:22:19.654] DEBUG[43943]: app_rpt/rpt_telemetry.c:381 cancel_pfxtone: cancel_pfxfone!!
[2024-08-26 15:22:19.654] DEBUG[43943]: app_rpt.c:1435 collect_function_digits: digits=35  source=0
[2024-08-26 15:22:19.654] DEBUG[43943]: app_rpt.c:1487 collect_function_digits: @@@@ action: ilink, param = 3
[2024-08-26 15:22:19.655] DEBUG[43943]: app_rpt.c:1493 collect_function_digits: @@@@ table index i = 3
[2024-08-26 15:22:19.655] DEBUG[43943]: app_rpt/rpt_functions.c:69 function_ilink: @@@@ ilink param = 3, digitbuf = 5
[2024-08-26 15:22:19.655] DEBUG[43943]: app_rpt/rpt_config.c:312 tlb_query_node_exists: chan_tlb not loaded.
[2024-08-26 15:22:19.655] NOTICE[43943]: app_rpt/rpt_config.c:624 node_lookup: Longest node 6.
[2024-08-26 15:22:19.655] DEBUG[43943]: app_rpt.c:1505 collect_function_digits: rv=0
[2024-08-26 15:22:19.765] DEBUG[43943]: app_rpt/rpt_telemetry.c:381 cancel_pfxtone: cancel_pfxfone!!
[2024-08-26 15:22:19.765] DEBUG[43943]: app_rpt.c:1435 collect_function_digits: digits=353  source=0
[2024-08-26 15:22:19.766] DEBUG[43943]: app_rpt.c:1487 collect_function_digits: @@@@ action: ilink, param = 3
[2024-08-26 15:22:19.766] DEBUG[43943]: app_rpt.c:1493 collect_function_digits: @@@@ table index i = 3
[2024-08-26 15:22:19.766] DEBUG[43943]: app_rpt/rpt_functions.c:69 function_ilink: @@@@ ilink param = 3, digitbuf = 53
[2024-08-26 15:22:19.766] DEBUG[43943]: app_rpt/rpt_config.c:312 tlb_query_node_exists: chan_tlb not loaded.
[2024-08-26 15:22:19.766] NOTICE[43943]: app_rpt/rpt_config.c:624 node_lookup: Longest node 6.
[2024-08-26 15:22:19.766] DEBUG[43943]: app_rpt.c:1505 collect_function_digits: rv=0
[2024-08-26 15:22:19.885] DEBUG[43943]: app_rpt/rpt_telemetry.c:381 cancel_pfxtone: cancel_pfxfone!!
[2024-08-26 15:22:19.885] DEBUG[43943]: app_rpt.c:1435 collect_function_digits: digits=3534  source=0
[2024-08-26 15:22:19.885] DEBUG[43943]: app_rpt.c:1487 collect_function_digits: @@@@ action: ilink, param = 3
[2024-08-26 15:22:19.885] DEBUG[43943]: app_rpt.c:1493 collect_function_digits: @@@@ table index i = 3
[2024-08-26 15:22:19.885] DEBUG[43943]: app_rpt/rpt_functions.c:69 function_ilink: @@@@ ilink param = 3, digitbuf = 534
[2024-08-26 15:22:19.886] DEBUG[43943]: app_rpt/rpt_config.c:312 tlb_query_node_exists: chan_tlb not loaded.
[2024-08-26 15:22:19.886] NOTICE[43943]: app_rpt/rpt_config.c:624 node_lookup: Longest node 6.
[2024-08-26 15:22:19.886] DEBUG[43943]: app_rpt.c:1505 collect_function_digits: rv=0
[2024-08-26 15:22:19.968]   == Manager 'admin' logged on from 127.0.0.1
[2024-08-26 15:22:19.969]   == Manager 'admin' logged on from 127.0.0.1
[2024-08-26 15:22:19.995] DEBUG[43943]: app_rpt/rpt_telemetry.c:381 cancel_pfxtone: cancel_pfxfone!!
[2024-08-26 15:22:19.995] DEBUG[43943]: app_rpt.c:1435 collect_function_digits: digits=35343  source=0
[2024-08-26 15:22:19.995] DEBUG[43943]: app_rpt.c:1487 collect_function_digits: @@@@ action: ilink, param = 3
[2024-08-26 15:22:19.996] DEBUG[43943]: app_rpt.c:1493 collect_function_digits: @@@@ table index i = 3
[2024-08-26 15:22:19.996] DEBUG[43943]: app_rpt/rpt_functions.c:69 function_ilink: @@@@ ilink param = 3, digitbuf = 5343
[2024-08-26 15:22:19.996] DEBUG[43943]: app_rpt/rpt_config.c:312 tlb_query_node_exists: chan_tlb not loaded.
[2024-08-26 15:22:19.996] DEBUG[43943]: app_rpt/rpt_config.c:396 node_lookup_bydns: Resolving DNS SRV records for: _iax._udp.5343.nodes.allstarlink.org
[2024-08-26 15:22:23.137] DEBUG[43943]: app_rpt/rpt_config.c:411 node_lookup_bydns: No SRV records returned for _iax._udp.5343.nodes.allstarlink.org
[2024-08-26 15:22:23.137] NOTICE[43943]: app_rpt/rpt_config.c:624 node_lookup: Longest node 6.
[2024-08-26 15:22:23.138] DEBUG[43943]: app_rpt.c:1505 collect_function_digits: rv=0
[2024-08-26 15:22:23.138] DEBUG[43943]: app_rpt/rpt_telemetry.c:381 cancel_pfxtone: cancel_pfxfone!!
KB4MDD commented 3 months ago

The main problem is the time to look up DNS records. I have confirmed it here.

app_rpt does not process more that 4 digits due to the DTMF timer expiring. The DTMF timer is hard coded for 3 seconds. If the DNS lookup does not complete in 3 seconds, the DTMF timer expires and the DTMF buffer is reset. At that point you see a cancel_pfxfone!! message and DTMF processing stops.

jxmx commented 3 months ago

So, to be clear, there's a complete time allocation of 3s to receive all DTMF tones and then execute the complete round-trip of the DNS lookup? In general, the DNS lookup de facto timers are to allocate 4 seconds for DNS lookups (2s for a first lookup, 2s to do a second resolver lookup if the first fails/timesout).

KB4MDD commented 3 months ago

I was not clear. The DTMF timer is reset with each digit.

jxmx commented 3 months ago

https://allstarlink.github.io/user-guide/known-issues-diffs/#dtmf-linking-dns-lookup-timeouts