boatbod / op25

Fork of osmocom OP25 by boatbod
311 stars 97 forks source link

Crypto code uses stale values & improperly promotes "next" ESS values #176

Closed roysjosh closed 1 year ago

roysjosh commented 1 year ago

Describe the bug While exploring P25 encryption, I noticed that the debug output had the same MI value across completely different calls. At first I wondered if it was a flaw in the system but closer investigation revealed it is the code maintaining the previous call's "next" ESS parameters.

Screenshots

FIRST CALL:

04/26/23 15:20:44.455151 [0] MAC_PTT: srcaddr=#####70, grpaddr=###11, algid=aa, keyid=XXX, mi=14 d3 9c d2 fb ea e5 a6 00, rs_errs=0
... call ... final 4V_BURST frame info below ...
04/26/23 15:20:51.335627 [0] 4V_BURST(0) ESS: algid=aa, keyid=XXX, mi=61 b6 a3 ed 99 d4 4a ca 00, rs_errs=0
... AMBE ...
... call ends ...

SECOND CALL:

04/26/23 15:20:52.209882 [0] MAC_PTT: srcaddr=#####11, grpaddr=###11, algid=aa, keyid=XXX, mi=b7 2d da 61 44 cc bb 44 00, rs_errs=0
04/26/23 15:20:52.209896 [0] p25_crypt_algs::prepare: ...
04/26/23 15:20:52.209928 [0] MAC_PTT: srcaddr=#####11, grpaddr=###11, algid=aa, keyid=XXX, mi=b7 2d da 61 44 cc bb 44 00, rs_errs=0
04/26/23 15:20:52.209933 [0] p25_crypt_algs::prepare: ...
04/26/23 15:20:52.217895 [0] mac_ptt: mi: b72dda6144ccbb4400 algid: aa keyid:3 ga: ###11 sa: #####11
04/26/23 15:20:52.217956 [0] mac_ptt: mi: b72dda6144ccbb4400 algid: aa keyid:3 ga: ###11 sa: #####11
04/26/23 15:20:52.318614 [0] 4V_BURST(0) ESS: algid=aa, keyid=XXX, mi=61 b6 a3 ed 99 d4 4a ca 00, rs_errs=0
04/26/23 15:20:52.318633 [0] p25_crypt_algs::prepare: ...
... AMBE ...
04/26/23 15:20:52.319515 [0] 4V_BURST(1) ESS: algid=aa, keyid=XXX, mi=61 b6 a3 ed 99 d4 4a ca 00, rs_errs=0
... AMBE ...
04/26/23 15:20:52.428483 [0] 4V_BURST(2) ESS: algid=aa, keyid=XXX, mi=61 b6 a3 ed 99 d4 4a ca 00, rs_errs=0
... AMBE ...
04/26/23 15:20:52.537846 [0] p25_framer::rx_sym() error check failed, frame discarded, nid=...
04/26/23 15:20:52.537894 [0] MAC_ACTIVE: mco=1/02(0x42), len=9, mco=0/01(0x01), len=7, mco=0/00(0x00), len=4, rs_errs=0
04/26/23 15:20:52.537910 [0] 4V_BURST(3) ESS: algid=aa, keyid=XXX, mi=61 b6 a3 ed 99 d4 4a ca 00, rs_errs=0
... AMBE ...
04/26/23 15:20:52.542090 set tgid=###19, srcaddr=0
04/26/23 15:20:52.542131 set tgid=###03, srcaddr=0
04/26/23 15:20:52.542144 [0] tsbk(0x02) grp_v_ch_grant_updt: ch1: FREQ1 ga1: ###19 ch2: FREQ2 ga2: ###03
04/26/23 15:20:52.542197 [0] tdma(0x01) grp_v_ch_usr: ga: ###11 sa: #####11
04/26/23 15:20:52.647182 [0] p25_framer::rx_sym() error check failed, frame discarded, nid=...
04/26/23 15:20:52.647207 [0] 2V_BURST(4) ESS: algid=aa, keyid=XXX, mi=68 13 2a 90 fc 0b 10 ba 00, rs_errs=0
... AMBE ...
04/26/23 15:20:52.647796 [0] 4V_BURST(0) ESS: algid=aa, keyid=XXX, mi=68 13 2a 90 fc 0b 10 ba 00, rs_errs=0
04/26/23 15:20:52.647806 [0] p25_crypt_algs::prepare: ...

Note the final MI value in the first call. It is never zeroed out. When the new call begins, in spite of the new MI value in the MAC_PTT PDU initially being set (see the first two "prepare" lines, one under each MAC_PTT line), the condition in p25p2_tdma::handle_packet checking burst_type and burst_id to promote the "next" ESS values triggers on the very first 4V PDU (see the 3rd "prepare") before the ESS has been fully received during the new call. This then uses the previous ESS' "next" parameters until the situation corrects itself after the 2V PDU (& therefore the entire ESS) has been received (see the 4th and final "prepare").

I've been testing with the following hacky patch in place:

diff --git a/op25/gr-op25_repeater/lib/p25p2_tdma.cc b/op25/gr-op25_repeater/lib/p25p2_tdma.cc
index 59f4e6b..67c800b 100644
--- a/op25/gr-op25_repeater/lib/p25p2_tdma.cc
+++ b/op25/gr-op25_repeater/lib/p25p2_tdma.cc
@@ -506,6 +506,8 @@ void p25p2_tdma::handle_voice_frame(const uint8_t dibits[])
                audio_valid = crypt_algs.process(p_cw);
                if (!audio_valid)
                        return;
+               fprintf(stderr, "... decrypted AMBE %02x %02x %02x %02x %02x %02x %02x\n",
+                           p_cw[0], p_cw[1], p_cw[2], p_cw[3], p_cw[4], p_cw[5], p_cw[6]);
         vf.unpack_cw(p_cw, u);  // unpack plaintext codewords
         vf.unpack_b(b, u);      // for unencrypted traffic this is done inside vf.process_vcw()
        }
@@ -608,21 +610,21 @@ int p25p2_tdma::handle_packet(uint8_t dibits[], const uint64_t fs)
        if (burst_type == 0 || burst_type == 6) {       // 4V or 2V burst
                track_vb(burst_type);
                handle_4V2V_ess(&xored_burst[84]);
-               std::string s = "{\"encrypted\": " + std::to_string((encrypted()) ? 1 : 0) + ", \"algid\": " + std::to_string(ess_algid) + ", \"keyid\": " + std::to_string(ess_keyid) + "}";
-               send_msg(s, M_P25_JSON_DATA);
-               if ((burst_type == 0) && (burst_id == 0)) {  // promote next set of encryption parameters if this is first 4V after a 2V
-                       ess_algid = next_algid;
-                       ess_keyid = next_keyid;
-                       memcpy(ess_mi, next_mi, sizeof(ess_mi));
-                   if (encrypted()) {
-                           crypt_algs.prepare(ess_algid, ess_keyid, ((burst_type == 0) ? FT_4V : FT_2V), ess_mi);
-                   }
-               }
                handle_voice_frame(&xored_burst[11]);
                handle_voice_frame(&xored_burst[48]);
                if (burst_type == 0) {
                        handle_voice_frame(&xored_burst[96]);
                        handle_voice_frame(&xored_burst[133]);
+               } else /* if (burst_type == 6) */ {
+                       // promote next set of encryption parameters AFTER we get the full ESS & process the 2V frame
+                       ess_algid = next_algid;
+                       ess_keyid = next_keyid;
+                       memcpy(ess_mi, next_mi, sizeof(ess_mi));
+                       if (encrypted()) {
+                               crypt_algs.prepare(ess_algid, ess_keyid, ((burst_type == 0) ? FT_4V : FT_2V), ess_mi);
+                       }
+                       std::string s = "{\"encrypted\": " + std::to_string((encrypted()) ? 1 : 0) + ", \"algid\": " + std::to_string(ess_algid) + ", \"keyid\": " + std::to_string(ess_keyid) + "}";
+                       send_msg(s, M_P25_JSON_DATA);
                }
                return -1;
        } else if (burst_type == 3) {                   // scrambled sacch
@@ -685,11 +687,13 @@ void p25p2_tdma::handle_4V2V_ess(const uint8_t dibits[])
                }
        }

-       if (d_debug >= 10) {
+       if (d_debug >= 10 && burst_id == 4) {
                fprintf(stderr, "ESS: algid=%x, keyid=%x, mi=%02x %02x %02x %02x %02x %02x %02x %02x %02x, rs_errs=%d\n",
                            next_algid, next_keyid,
                            next_mi[0], next_mi[1], next_mi[2], next_mi[3], next_mi[4], next_mi[5],next_mi[6], next_mi[7], next_mi[8],
                            ec);
+       } else if (d_debug >= 10) {
+               fprintf(stderr, "ESS: (partial)\n");
        }
 }

System Information (please complete the following information):

boatbod commented 1 year ago

I can confirm this is something of a known problem that affects both phase 1 and phase 2 P25 encrypted traffic. The underlying issue is that the c++ library code doesn't know anything about call state so it has no notion of when a new call is starting vs receiving more frames on an existing call. To my way of thinking, I need to add an interface so that python call control can command the disposal of old ESS data at the appropriate time.

lwvmobile commented 1 year ago

Why not just memset the stale values on an MAC_END_PTT event?

roysjosh commented 1 year ago

Why not just memset the stale values on an MAC_END_PTT event?

That wouldn't solve the root of the issue which is the promotion of ANY (correct, incorrect) value at the start of a burst (the first 4V).

roysjosh commented 1 year ago

Promoting the ESS parameters when first received in the MAC_PTT and after a full super (?) frame (4V, 4V, 4V, 4V, 2V) seems to be correct. Not MAC_PTT and the start of a super frame, when you don't have the ESS. Even with memset or call disposal, the entire first super frame won't parse if mi=0, key=0, algid=80 ("in the clear") parameters are promoted.

lwvmobile commented 1 year ago

So, only use the initial ESS from the MAC_PTT, promote the next set after a 2V then, and zero them out on a MAC_PTT_END.

boatbod commented 1 year ago

I do concur that the ESS can be zeroed upon reception of a MAC_END_PTT and similarly with a TDU15. This still leaves a corner case for what will happen if either of these messages are missed. (MAC_END_PTT usually gets sent several times, so it's going to be a rare occurrence to miss them all.) The other corner case is what happens when a call in progress is pre-empted due to TG priority... ESS would need zeroing then too, which brings us back to the need to be able to command it from python.

ESS promotion is a interesting topic and not - in my view - written very well in the TIA-102.AAAx specs; the current methodology was determined experientially from testing with encrypted traffic captured in a lab environment. There are presently several known issues:

roysjosh commented 1 year ago

Thanks for the info. FWIW, I do tend to get the MAC_PTT but I occasionally experience local interference causing 1 or more of the Voice-4 PDUs to drop. This gets the audio out of sync with the keystream and you've got the same problem until the next ESS.

Here's the next MI generator func:

#!/usr/bin/env python3

import numpy as np

# XXX truncate trailing 00 if len(lsr) > 8
lsr = bytes.fromhex('b72dda6144ccbb44')
lsr_bits = np.unpackbits(np.frombuffer(lsr, dtype="uint8"))

def lsr_shift(bits):
    newb = (bits[-15] + bits[-27] + bits[-38] + bits[-46] + bits[-62] + bits[-64]) % 2
    bits = np.roll(bits, -1)
    bits[-1] = newb
    return bits

new_lsr = lsr_bits

for _ in range(64):
    new_lsr = lsr_shift(new_lsr)

print(np.packbits(new_lsr).tobytes().hex().upper())
roysjosh commented 1 year ago

I went a step further and tied the offset into the keystream to the TDMA slot ID. The 2V PDU repeats in the same slot as the final MAC_PTT & storing that value allows propagation of the offset to the crypto code so it no longer needs to separately track (& assume) the offset. This greatly improves listening when one or more voice PDUs are dropped in the middle of a call. I'll post some patches for discussion after a bit of polish.

boatbod commented 1 year ago

After merging the pull request I've started seeing some curious log messages pertaining to "missing" voice frames. I'm not sure what to think about this since the audio does not sound broken.

ETA - this seems to be an initialization problem when first tuning the voice channel at the start of a transmission. With late entry into the channel you can't assume what you'll get for first frame received.

Original log:

3 voice frame(s) missing; expecting 4V_0 but got 4V_3
05/01/23 18:12:14.763352 [1] voice update:  tg(12421), rid(2399994), freq(771.718750), slot(0), prio(3)
05/01/23 18:12:49.403646 [1] voice update:  tg(12421), rid(2383024), freq(773.593750), slot(0), prio(3)
05/01/23 18:12:58.489494 [1] voice update:  tg(12421), rid(2399994), freq(771.718750), slot(1), prio(3)
2 voice frame(s) missing; expecting 4V_1 but got 4V_3
05/01/23 18:13:18.518839 [1] voice update:  tg(12421), rid(2383024), freq(771.718750), slot(1), prio(3)
05/01/23 18:13:30.512944 [1] voice update:  tg(12421), rid(0), freq(773.593750), slot(0), prio(3)
1 voice frame(s) missing; expecting 4V_1 but got 4V_2
05/01/23 18:13:43.488332 [1] voice update:  tg(12421), rid(2383024), freq(773.593750), slot(0), prio(3)
1 voice frame(s) missing; expecting 4V_0 but got 4V_1
05/01/23 18:14:14.232933 [1] voice update:  tg(12421), rid(2381224), freq(773.593750), slot(1), prio(3)
05/01/23 18:14:19.711933 [1] voice update:  tg(12421), rid(2399994), freq(773.593750), slot(1), prio(3)
1 voice frame(s) missing; expecting 4V_0 but got 4V_1

Level 10 debug log:

05/01/23 18:25:46.728943 [0] tsbk(0x39) sccb: rfid: 1 stid: 9 ch1: 160e(771.693750) ch2: 15ba(771.168750)
05/01/23 18:25:46.729148 [0] tsbk(0x33) iden_up_tdma: id: 4 freq: 851006250 toff: -45000000 spac: 12500 slots/carrier: 2
05/01/23 18:25:46.808339 [0] p25_framer::load_nid() error check failed, frame discarded, nid=a78e21314d7c52df
05/01/23 18:25:46.808397 [0] rx_sync::sync_reset:
05/01/23 18:25:47.088334 [0] rx_sync::sync_established: protocol P25P2
4 voice frame(s) missing; expecting 4V_0 but got 2V_4
05/01/23 18:25:47.248323 [0] 2V_BURST(4) TDMA slot ID=1 ESS: algid=80, keyid=0, mi=00 00 00 00 00 00 00 00 00, rs_errs=-1
05/01/23 18:25:47.248362 [0] AMBE 01 3a 27 8c 6c 49 80 errs 0 err_rate 0.000000, dt 6.816619
05/01/23 18:25:47.248471 [0] AMBE 0d 0f 7e df 5f 24 80 errs 0 err_rate 0.000000, dt 0.000109
05/01/23 18:25:47.248909 [0] AMBE 65 74 8e 73 03 63 80 errs 0 err_rate 0.000000, dt 0.000438
05/01/23 18:25:47.249330 [0] AMBE 0d 07 cf 0d 55 f5 00 errs 0 err_rate 0.000000, dt 0.000421
05/01/23 18:25:47.308593 [0] 2V_BURST(4) TDMA slot ID=3 ESS: algid=80, keyid=0, mi=00 00 00 00 00 00 00 00 00, rs_errs=-1
05/01/23 18:25:47.309046 [0] AMBE 06 31 71 02 21 50 00 errs 0 err_rate 0.000000, dt 0.059716
lwvmobile commented 1 year ago

FWIW, If I'm not mistaken, I don' think there is a way to 'know' based on the I-ISCH, DUID, or SACCH timeslots which 4V you are currently on until a 2V rolls in and you can reset the counter, so to speak. Just going by the manual on this one. I could be wrong though. 4V and 2V can land on any timeslot aside from the SACCH slots.

That being said, as mentioned above, it does look like you can infer which slot a 2V or the first 4V will occur on (and count backwards) if you know where PT.1 or PT.0 hits in the superframe (assuming you tuned fast enough to catch both MAC_PTTs)

Screenshot from 2023-05-01 18-49-59

roysjosh commented 1 year ago

After merging the pull request I've started seeing some curious log messages pertaining to "missing" voice frames. I'm not sure what to think about this since the audio does not sound broken.

ETA - this seems to be an initialization problem when first tuning the voice channel at the start of a transmission. With late entry into the channel you can't assume what you'll get for first frame received.

Original log:

3 voice frame(s) missing; expecting 4V_0 but got 4V_3
05/01/23 18:12:14.763352 [1] voice update:  tg(12421), rid(2399994), freq(771.718750), slot(0), prio(3)
05/01/23 18:12:49.403646 [1] voice update:  tg(12421), rid(2383024), freq(773.593750), slot(0), prio(3)
05/01/23 18:12:58.489494 [1] voice update:  tg(12421), rid(2399994), freq(771.718750), slot(1), prio(3)
2 voice frame(s) missing; expecting 4V_1 but got 4V_3
05/01/23 18:13:18.518839 [1] voice update:  tg(12421), rid(2383024), freq(771.718750), slot(1), prio(3)
05/01/23 18:13:30.512944 [1] voice update:  tg(12421), rid(0), freq(773.593750), slot(0), prio(3)
1 voice frame(s) missing; expecting 4V_1 but got 4V_2
05/01/23 18:13:43.488332 [1] voice update:  tg(12421), rid(2383024), freq(773.593750), slot(0), prio(3)
1 voice frame(s) missing; expecting 4V_0 but got 4V_1
05/01/23 18:14:14.232933 [1] voice update:  tg(12421), rid(2381224), freq(773.593750), slot(1), prio(3)
05/01/23 18:14:19.711933 [1] voice update:  tg(12421), rid(2399994), freq(773.593750), slot(1), prio(3)
1 voice frame(s) missing; expecting 4V_0 but got 4V_1

Level 10 debug log:

05/01/23 18:25:46.728943 [0] tsbk(0x39) sccb: rfid: 1 stid: 9 ch1: 160e(771.693750) ch2: 15ba(771.168750)
05/01/23 18:25:46.729148 [0] tsbk(0x33) iden_up_tdma: id: 4 freq: 851006250 toff: -45000000 spac: 12500 slots/carrier: 2
05/01/23 18:25:46.808339 [0] p25_framer::load_nid() error check failed, frame discarded, nid=a78e21314d7c52df
05/01/23 18:25:46.808397 [0] rx_sync::sync_reset:
05/01/23 18:25:47.088334 [0] rx_sync::sync_established: protocol P25P2
4 voice frame(s) missing; expecting 4V_0 but got 2V_4
05/01/23 18:25:47.248323 [0] 2V_BURST(4) TDMA slot ID=1 ESS: algid=80, keyid=0, mi=00 00 00 00 00 00 00 00 00, rs_errs=-1
05/01/23 18:25:47.248362 [0] AMBE 01 3a 27 8c 6c 49 80 errs 0 err_rate 0.000000, dt 6.816619
05/01/23 18:25:47.248471 [0] AMBE 0d 0f 7e df 5f 24 80 errs 0 err_rate 0.000000, dt 0.000109
05/01/23 18:25:47.248909 [0] AMBE 65 74 8e 73 03 63 80 errs 0 err_rate 0.000000, dt 0.000438
05/01/23 18:25:47.249330 [0] AMBE 0d 07 cf 0d 55 f5 00 errs 0 err_rate 0.000000, dt 0.000421
05/01/23 18:25:47.308593 [0] 2V_BURST(4) TDMA slot ID=3 ESS: algid=80, keyid=0, mi=00 00 00 00 00 00 00 00 00, rs_errs=-1
05/01/23 18:25:47.309046 [0] AMBE 06 31 71 02 21 50 00 errs 0 err_rate 0.000000, dt 0.059716

Hm, I should have wrapped that in an if. I'll open a PR in a bit. You can see on your debug=10 that we go from one 2V to another, thereby missing the 4x 4V PDUs. And yeah, it suffers from the same problem as the rest of the crypto code - carrying state from one call to the next.

roysjosh commented 1 year ago

FWIW, If I'm not mistaken, I don' think there is a way to 'know' based on the I-ISCH, DUID, or SACCH timeslots which 4V you are currently on until a 2V rolls in and you can reset the counter, so to speak. Just going by the manual on this one. I could be wrong though. 4V and 2V can land on any timeslot aside from the SACCH slots.

That being said, as mentioned above, it does look like you can infer which slot a 2V or the first 4V will occur on (and count backwards) if you know where PT.1 or PT.0 hits in the superframe (assuming you tuned fast enough to catch both MAC_PTTs)

Screenshot from 2023-05-01 18-49-59

I actually just found something in BBAC: Table 8-7 Offset Values & Figure 8-68 Offset Field Usage. There's an offset value in SACCH MAC PDUs that tells you where the first 4V of a voice burst is. That does the same thing as my tracking of MAC_PTT & 2V but is a part of the protocol instead.

boatbod commented 1 year ago

While you may be able to infer the 2V location by counting back from MAC_PTT, how do you know if it's PTT.1 or PTT.0 unless you capture them both?

I've backed out the new burst_id assigment code until we get this figured out. Right now the existing scheme is in place and it simply looks for the 2V and starts counting from there.

roysjosh commented 1 year ago

Check the same table. The offset field is 1 for the first & 0 for the second MAC_PTT. I've been watching my local channels with some additional debug. It seems to match up.

The idea with the current code is that you will likely either capture PTT.1 & PTT.0, just PTT.0, or no PTT at all. If you do happen to capture PTT.1 and NOT PTT.0, you will be out of sync for a voice burst. With the current code, updating the d_tdma_slot_for_2v variable twice in a row (PTT.1 & PTT.0) was something I didn't know how to avoid but in general you ended up with the PTT.0 TDMA slot saved which is identical to future 2V slots. But now that I've found the offset field, it can be handled better.

boatbod commented 1 year ago

I'll take a look.  Last night when I tried to utilize offset it wasn't working reliably for me.

On 5/3/23 07:53, Joshua Roys wrote:

Check the same table. The offset field is 1 for the first & 0 for the second MAC_PTT. I've been watching my local channels with some additional debug. It seems to match up.

The idea with the current code is that you will likely either capture PTT.1 & PTT.0, just PTT.0, or no PTT at all. If you do happen to capture PTT.1 and NOT PTT.0, you will be out of sync for a voice burst. With the current code, updating the |d_tdma_slot_for_2v| variable twice in a row (PTT.1 & PTT.0) was something I didn't know how to avoid but in general you ended up with the PTT.0 TDMA slot saved which is identical to future 2V slots. But now that I've found the offset field, it can be handled better.

— Reply to this email directly, view it on GitHub https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlQPRmtf9ULvEn-2FPlJ0wsbnfkS6Kl3JONNIYAVkDW-2FxAMeke2UHAp7F6vy-2FY5PReGFfNqiNNRuoSfNKSl38hBqY8-3D-BQT_U31jt5HgHBh7k3Grc5s4lzTjbuH3rZQavWxUFsZnPdEEDhUf5xGjPu4bV-2BfpaeYQqCkvY88665k38WH4KiAG0uY-2Fdy4nxBT7cx-2BwcfKe-2BnHjxjhSKxOVlZwDJDdWmK9zRMuc0ZDpe72y7IJ68b9Vc8ESg4BC2vzNcwnYgeQuruMtskcqS2pcjfUkyw9DUc2DNU8Q9XYsw8T6tnVzwBrLVu1ATVli1aCs-2FqWcOWWRXd5nj4TnGRtDxN724CNjRt3kRWrtGmCtfU9rEKX7x60qiw-3D-3D, or unsubscribe https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlThKVtke1jgecVWr5YV2DBEpfYGgjnAoQWi69Sc301gEgMm1Rixh8I0DqQOQlmiTET3hGvfLFkn4Z-2FHKT0U1AALN17CQMMVknYVa561oFS0lMXT-2BVKfFtjETJ94UvdOn-2BA-3D-3Dy9cj_U31jt5HgHBh7k3Grc5s4lzTjbuH3rZQavWxUFsZnPdEEDhUf5xGjPu4bV-2BfpaeYQqCkvY88665k38WH4KiAG0uY-2Fdy4nxBT7cx-2BwcfKe-2BnHjxjhSKxOVlZwDJDdWmK9zf932tUIv-2Bhi21n-2Bml5tY2vzzmxsn1u-2F3isMnPI2RxU2pAh9sYfTSifMqZ9pCtVRgVQkCdQlUF0caEY-2Bqcyhh3Jf3p8U2ZPSCJV-2B6NR89EaBqrltLLu9ZJyq5-2FY6yYTcqC2ND4iSbATa8utUhLb48eQ-3D-3D. You are receiving this because you commented.Message ID: @.***>

roysjosh commented 1 year ago

This seems off to me:

05/07/23 08:40:39.717082 [0] process_mac_pdu: opcode 1 offset 1 len 18
05/07/23 08:40:39.717107 [0] MAC_PTT: srcaddr=#######, grpaddr=#####, TDMA slot ID=3, algid=aa, keyid=####, mi=05 fd 85 60 d5 95 e6 ff 00, rs_errs=0
05/07/23 08:40:39.717224 [0] process_mac_pdu: opcode 1 offset 0 len 18
05/07/23 08:40:39.717236 [0] MAC_PTT: srcaddr=#######, grpaddr=#####, TDMA slot ID=5, algid=aa, keyid=####, mi=05 fd 85 60 d5 95 e6 ff 00, rs_errs=0
05/07/23 08:40:39.826634 [0] 4V_BURST(0) TDMA slot ID=7 ESS: (partial)
05/07/23 08:40:39.828251 [0] 4V_BURST(1) TDMA slot ID=9 ESS: (partial)
05/07/23 08:40:39.935708 [0] process_mac_pdu: opcode 4 offset 3 len 21
05/07/23 08:40:39.935966 [0] 4V_BURST(2) TDMA slot ID=1 ESS: (partial)
05/07/23 08:40:40.044947 [0] 4V_BURST(3) TDMA slot ID=3 ESS: (partial)
05/07/23 08:40:40.046964 [0] 2V_BURST(4) TDMA slot ID=5 ESS: algid=aa, keyid=####, mi=fb 9d b9 eb e7 b3 17 8b 00, rs_errs=0
Current MI_1=0x5fd8560d595e6ff MI_2=0xfb9db9ebe7b3178b (prev, next)
Computed MI_1=0xfb9db9ebe7b3178b MI_2=0xfb9db9ebe7b3178b (should match)

1 voice frame(s) missing; expecting 4V_0 but got 4V_1
05/07/23 08:40:40.263062 [0] 4V_BURST(1) TDMA slot ID=9 ESS: (partial)
05/07/23 08:40:40.372353 [0] 4V_BURST(2) TDMA slot ID=1 ESS: (partial)
05/07/23 08:40:40.481855 [0] 4V_BURST(3) TDMA slot ID=7 ESS: (partial)
05/07/23 08:40:40.590887 [0] 2V_BURST(4) TDMA slot ID=9 ESS: algid=aa, keyid=####, mi=fb 9d b9 eb e7 b3 17 8b 00, rs_errs=-1
05/07/23 08:40:40.591867 [0] process_mac_pdu: opcode 4 offset 3 len 21
2 voice frame(s) missing; expecting 4V_0 but got 4V_2

05/07/23 08:40:40.700159 [0] 4V_BURST(2) TDMA slot ID=1 ESS: (partial)
05/07/23 08:40:40.700701 [0] 4V_BURST(3) TDMA slot ID=3 ESS: (partial)
05/07/23 08:40:40.809542 [0] 2V_BURST(4) TDMA slot ID=5 ESS: algid=aa, keyid=####, mi=2b 22 b5 27 0b 25 40 6d 00, rs_errs=4
Current MI_1=0xfb9db9ebe7b3178b MI_2=0x2b22b5270b25406d (prev, next)
Computed MI_1=0x84aff4f9754f9178 MI_2=0x2b22b5270b25406d (should match)
05/07/23 08:40:40.918561 [0] 4V_BURST(0) TDMA slot ID=7 ESS: (partial)
05/07/23 08:40:40.919883 [0] 4V_BURST(1) TDMA slot ID=9 ESS: (partial)
05/07/23 08:40:40.920598 [0] process_mac_pdu: opcode 4 offset 3 len 21
05/07/23 08:40:41.027864 [0] 4V_BURST(2) TDMA slot ID=1 ESS: (partial)
05/07/23 08:40:41.136970 [0] 4V_BURST(3) TDMA slot ID=3 ESS: (partial)
05/07/23 08:40:41.138605 [0] 2V_BURST(4) TDMA slot ID=5 ESS: algid=aa, keyid=####, mi=2b 6a 83 5f f9 46 df 0c 00, rs_errs=0
Current MI_1=0x2b22b5270b25406d MI_2=0x2b6a835ff946df0c (prev, next)
Computed MI_1=0x2b6a835ff946df0c MI_2=0x2b6a835ff946df0c (should match)
05/07/23 08:40:41.246162 [0] 4V_BURST(0) TDMA slot ID=7 ESS: (partial)
05/07/23 08:40:41.247161 [0] 4V_BURST(1) TDMA slot ID=9 ESS: (partial)
05/07/23 08:40:41.355474 [0] process_mac_pdu: opcode 4 offset 3 len 21

I received both MAC_PTTs and the first superframe. Then the next superframe is received on the incorrect TDMA slots somehow? And then we are back in sync for the following superframes.

I took a look at the lower-level TDMA bits & I don't fully understand them yet. I do see the ISCH code doesn't have error correction, currently... (I googled "coset error decode" and thoroughly confused myself) and all function calls to p25p2_sync::check_confidence increment _tdma_slotid whether or not we decide that we've got a frame. I feel like this could get out of sync with reality until the next I-ISCH with the location bits... but I'm not sure how we differentiate a bad frame from noise in between the ISCH data.

We could collect an entire superframe and make best-effort guesses at TDMA slots & their data before releasing it to the higher-level code but that seems like a fairly invasive change.

vcrtech commented 1 year ago

You may have already looked, but I’m curious if SDRTrunk has anything to offer in terms of slot logic/error recovery?

roysjosh commented 1 year ago

@vcrtech I took a look at the ISCH recovery & SDRTrunk uses a closest (least bit flips) strategy to recover codewords. I haven't looked at how they handle TDMA slots, though. Good suggestion.

boatbod commented 1 year ago

p25p2_isch and p25p2_sync are nice enough to report when there is a decode failure (rc=-1) so could we make assumptions about the frame just received by extrapolating from the last known good decode?  In reality that may be what "check confidence" is trying to do by already incrementing it's counter.

On 5/7/23 09:04, Joshua Roys wrote:

coset error decode

boatbod commented 1 year ago

The new "missing frame" logic doesn't appear to be working reliably right now.  Take a look at this sequence from my library of captured ADP traces:

05/07/23 19:22:35.586591 [0] rx_sync::sync_established: protocol P25P2 4 voice frame(s) missing; expecting 4V_0 but got 2V_4. ISCH rc=48 05/07/23 19:22:35.726766 [0] 2V_BURST(4) TDMA slot ID=9 ESS: algid=80, keyid=0, mi=00 00 00 00 00 00 00 00 00, rs_errs=-1 05/07/23 19:22:35.726836 [0] AMBE b8 33 94 81 64 59 00 errs 0 err_rate 0.000000, dt 5.329989 05/07/23 19:22:35.727617 [0] AMBE a3 4b ed f6 a9 08 80 errs 0 err_rate 0.000000, dt 0.000781 05/07/23 19:22:35.727903 [0] AMBE 03 6f ef d6 76 0c 80 errs 0 err_rate 0.000000, dt 0.000286 05/07/23 19:22:35.728246 [0] AMBE 27 80 4c e3 5f 27 80 errs 0 err_rate 0.000000, dt 0.000343 05/07/23 19:22:35.766626 [0] process_mac_pdu: opcode 4 offset 1 len 21 05/07/23 19:22:35.766708 [0] MAC_ACTIVE: mco=0/01(0x01), len=7, mco=1/02(0x42), len=9, mco=0/00(0x00), len=4, rs_errs=0

The received frame is unquestionably a 4V (burst_type == 0 with four sets of AMBE codewords, all with 0 errs in the FEC) yet the missing frame logic thinks it should be a 2V frame.  ISCH check_confidence result was non-negative, so no bit errors.

Not sure what to think, but if the burst_type conflicts with the calculated burst_id, I'm inclined to believe the burst_type since it is actually received from protocol decode.

Graham

On 5/7/23 18:53, Graham Norbury wrote:

p25p2_isch and p25p2_sync are nice enough to report when there is a decode failure (rc=-1) so could we make assumptions about the frame just received by extrapolating from the last known good decode?  In reality that may be what "check confidence" is trying to do by already incrementing it's counter.

On 5/7/23 09:04, Joshua Roys wrote:

coset error decode

roysjosh commented 1 year ago

Is that the first voice PDU after getting sync? No MAC_PTT or SACCH before the voice PDU?

Hmm... bad state. It looks like you're getting current_slot set to TDMA slot 9 >> 1 == 4. Hence the bad debug saying we're expecting 2V_4.

I agree that we should prioritize the DUID. I think the new code shouldn't take corrective action until it has received definite information (MAC_PTT, SACCH, 2V DUID).

I've made burst_type a class variable for better propagation of the current DUID. It looks like I'll need to set & reset the expected slot (d_tdma_slot_first_4v) to -1 to designate unknown & use that to skip the slot adjustment logic.

boatbod commented 1 year ago

Yes, the presence of "rx_sync::sync_established: protocol P25P2" implies the first frame after either a sync recovery or a tune. This particular raw capture must have been taken under marginal signal conditions because it's fully of bit errors, unlike some other I have that were made in a lab conditions are are perfectly clean.

Last night I added on to p25p2_sync to allow access to the "last_rc" value so you can see if the ISCH was received clean or not.  The value "-2" is reserved for S-ISCH (sync) and "-1" means there was one or more bit errors in the decode.  Positive values are clean decode.  I'd recommend we incorporate that into the predictive code so that burst_id only gets manipulated if the ISCH is clean.

On 5/7/23 21:22, Joshua Roys wrote:

Is that the first voice PDU after getting sync? No MAC_PTT or SACCH before the voice PDU?

Hmm... bad state. It looks like you're getting |current_slot| set to TDMA slot |9 >> 1 == 4|. Hence the bad debug saying we're expecting 2V_4.

I agree that we should prioritize the DUID. I think the new code shouldn't take corrective action until it has received definite information (MAC_PTT, SACCH, 2V DUID).

I've made |burst_type| a class variable for better propagation of the current DUID. It looks like I'll need to set & reset the expected slot (|d_tdma_slot_first_4v|) to -1 to designate unknown & use that to skip the slot adjustment logic.

— Reply to this email directly, view it on GitHub https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlQPRmtf9ULvEn-2FPlJ0wsbnfkS6Kl3JONNIYAVkDW-2FxAMeke2UHAp7F6vy-2FY5PReGFVLdu0DvnJ1DRXckCTcFoxU-3Dc5fV_U31jt5HgHBh7k3Grc5s4l2ONlozivuo9qYhw7Ihp1vNGQUgalnU4wEdwqbxvlGmENyKZ48RhneZmTq-2BXeWprkw65h6yYzsB0m3BA6QX0HElhwFjwKw8MzxnugX0qNuPxVgBQItyCcNfB84VM1tEH-2BT7mV9WoMCnSY1fA9qTiW7DrFAcHvza38JIFvuBQUfRIrSlIkgL1zS3sFDm0pm34C1CIKFsQRPpJb9Lqhi-2BBTH4hPbhFABvooBxjM-2B39q6DNJUEAHauxwsd-2BAWZpLsOn4w-3D-3D, or unsubscribe https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlThKVtke1jgecVWr5YV2DBEpfYGgjnAoQWi69Sc301gEgMm1Rixh8I0DqQOQlmiTEc-2FLHtEtElBQTQnYYmrhYSUO3xG1o90MwoJhHLtYGuM4U0iqKMDGBdSJPp782DQwRQ-3D-3DAxXn_U31jt5HgHBh7k3Grc5s4l2ONlozivuo9qYhw7Ihp1vNGQUgalnU4wEdwqbxvlGmENyKZ48RhneZmTq-2BXeWprkw65h6yYzsB0m3BA6QX0HElhwFjwKw8MzxnugX0qNuPxwZjcdItKfv0a-2BY5zSf1Mhz-2BuUPUge1tiFTYw0OW-2FbeBFJcFEX9cR6WcnfuVU32Pao7Pi8PxKpcaPPxVDwzDvI9aFFwGDemYIskeX4fjP6cfYQrXI-2Fb1FBiF1LkDZzuja3X2ejLotrUBYe-2BnYREc76w-3D-3D. You are receiving this because you commented.Message ID: @.***>

roysjosh commented 1 year ago

Okay, thanks. I'll rebase on your latest today & add the set/reset of d_tdma_slot_first_4v plus incorporate that return code (unless you beat me to it! thanks for cleaning up my other commits).

It looks like sdrtrunk collects a 1440-bit (4*320 + 4*40 ISCH) 1/3 of a superframe at a time, not just a single 320-bit PDU. I would have to stare at the code longer but I imagine that allows full use of all 4 ISCH (2x info, 2x sync) to help sync the bits to the slot & thanks to the I-ISCH you know the exact location of the PDUs (& voice frames) in the super-/ultra-frame.

I need to add debug to the ISCH code again but IIRC I was getting a decent amount of discards from that code. A quick look back at some logs I captured showed 893 total ISCH lookups, of which 412 were S-ISCH, and 262 (!!!) were -1 failure returns. I might have my rtl-sdr exposed to some local, bursty noise though... I've got a cable run to my attic, I just need a new rpi to run rtl_tcp on. My only spare apparently has dead USB ports 😢.

lwvmobile commented 1 year ago

It looks like sdrtrunk collects a 1440-bit (4320 + 440 ISCH) 1/3 of a superframe at a time, not just a single 320-bit PDU.

This is the same method I adopted, because its the minimum you can collect at once after S-ISCH to ensure I can look up and find the channel number/isch location number in the timeslots by I-ISCH so I can properly apply the descrambling by offsetting the start of the descrambling bitstream according to current sync location. By contrast, I used to collect and buffer an entire superframe, but had lag when doing so.

A quick look back at some logs I captured showed 893 total ISCH lookups, of which 412 were S-ISCH, and 262 (!!!) were -1 failure returns.

My observation has been on Phase 2 systems that after an S-ISCH, there is no guarantee that the rest of the timeslots in-between will be filled with any data, so again, not sure how OP25 handles it, but because I gather and buffer 4 timeslots worth on each S-ISCH, its possible (and often observed) that I can have anywhere from 1 to 3 duid and/or errors when all call activity stops due to RF channel tear down. I usually abort after two consecutive Phase 2 DUID errors and either look for renewed frame sync, or return to control channel after x seconds hangtime. Its quite possible a lot of those DUID or ISCH errors you have in your log were result of channel teardown. If not, then must be some iffy signal.

boatbod commented 1 year ago

Having read up a bunch on Coset codes I came to the conclusion that error-correcting op25's ISCH decode could most practically be achieved by simply looking for minimum hamming distance. The ISCH is only really 7 bits long (2 reserved bits are set to 0) so that's a lookup table of 127 entries plus 1 additional entry for the S-ISCH sync value. Seems to work reasonably sanely, at least on my local system. Changes have been pushed to dev.

roysjosh commented 1 year ago

@boatbod what do you think about making the new p25p2_tdma::ess_reset a more generic "reset ALL call state" function? Or would you rather keep that one separate and have an overarching function call it plus other things?

boatbod commented 1 year ago

I had been thinking about that as I was creating the interface. Renaming should be straight-forward; how about "call_end()"?

On 5/9/23 04:57, Joshua Roys wrote:

@boatbod https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlQPRmtf9ULvEn-2FPlJ0wsbncG0lc66iB6qwjWa3rVasSW-pNZ_U31jt5HgHBh7k3Grc5s4l51WdInaLbibSwGg6wsJoHoO6TYpjP78WOABxGx3vQ-2FnsPubdub4DLLDTdZanHztc2oUnZ-2B-2FVo15L4-2BGu9-2B-2F-2FB8brqgyKqcMdQG-2FZo3xKetL6f3Fsffwp3D7iiXvOm3csMpnl4F7Ka03gu0synDCCPxDteETXzem8OvGVhv7HI0mLgIVtBGqe4TxF5wtwmD3LwojRWcVvHO-2BJkGaCDX61x-2BmLF9uwHgKnLVQ5X3UCXjUNYTT04DuQekV9sxLk-2BwUUw-3D-3D what do you think about making the new |p25p2_tdma::ess_reset| a more generic "reset ALL call state" function? Or would you rather keep that one separate and have an overarching function call it plus other things?

— Reply to this email directly, view it on GitHub https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlQPRmtf9ULvEn-2FPlJ0wsbnfkS6Kl3JONNIYAVkDW-2FxAMeke2UHAp7F6vy-2FY5PReGFcbaoffolhvIXAMBS3WoybY-3DHEzf_U31jt5HgHBh7k3Grc5s4l51WdInaLbibSwGg6wsJoHoO6TYpjP78WOABxGx3vQ-2FnsPubdub4DLLDTdZanHztc2oUnZ-2B-2FVo15L4-2BGu9-2B-2F-2FB8brqgyKqcMdQG-2FZo3xKetLDSeIHQLFjoZx3iwHSzR3d9sCQUiegSnF298Ol9itr8d5ySrBjks-2F78jGGlcw2OSgawuFURMApv4FHsimpm9u5ZnZL1tGsnI2Ga6xHV4lI45WOalrKZp8nVxr9HSVpXfTNvoYpdVq8iempydIEMTp2g-3D-3D, or unsubscribe https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlThKVtke1jgecVWr5YV2DBEpfYGgjnAoQWi69Sc301gEgMm1Rixh8I0DqQOQlmiTEZ-2FsxeSg7lwBwz6MGm5dQYtG8Rty876-2FoVNEvhrFzz-2FAISSQM26Mv91TpYZORWlZYw-3D-3D966i_U31jt5HgHBh7k3Grc5s4l51WdInaLbibSwGg6wsJoHoO6TYpjP78WOABxGx3vQ-2FnsPubdub4DLLDTdZanHztc2oUnZ-2B-2FVo15L4-2BGu9-2B-2F-2FB8brqgyKqcMdQG-2FZo3xKetLCTC7H97nTt36ucopDGpCZrCkR-2BgBt-2BlHHtA8pNHRFmtQvRv86cMGHsfLw579BuISb7Yb-2BmfS5ASk5ziFK9NZvu527bL3oRwwNaY-2FJ4l7OiBCSl7awTdDrpz-2FPluIMTodHXsvr14WYCbjZFcuU7JhbA-3D-3D. You are receiving this because you were mentioned.Message ID: @.***>

boatbod commented 1 year ago

All the pull-requests to-date have been pulled into "dev" and I've made some significant changes to the api that exists between python and c++ to make extending it much easier. (It's now JSON based)

Looking at the logs from replaying my ADP traces, I think you may need to account for ISCH rc=-2. This would mean an S-ISCH (sync sequence) was detected, and thus containing no ISCH position bits to help detect missing frames.

roysjosh commented 1 year ago

Okay. That does seem the safer option. PR incoming with that and an additional change for your feedback. I've moved some of the sync code around so that an I-ISCH can correct the TDMA slot before the sync confidence value is set. In really noisy conditions, the internal TDMA counter might be better... a corrupt I-ISCH could cause the TDMA slot to jump around. But we'll sync back up faster (or stay in sync) with the code ordered the new way. What do you think?

boatbod commented 1 year ago

From a known good starting point, I would think the internal counter would provide the best stability as long as it gets incremented even in the case of really noisy received data.  The ISCH status bits, duids and other methods of figuring out a sync point are great for establishing/re-establishing/validating sync, but I'm not yet convinced they should override the counter if there is a reasonable probability that the counter is correct.

On 5/11/23 22:32, Joshua Roys wrote:

Okay. That does seem the safer option. PR incoming with that and an additional change for your feedback. I've moved some of the sync code around so that an I-ISCH can correct the TDMA slot before the sync confidence value is set. In really noisy conditions, the internal TDMA counter might be better... a corrupt I-ISCH could cause the TDMA slot to jump around. But we'll sync back up faster (or stay in sync) with the code ordered the new way. What do you think?

— Reply to this email directly, view it on GitHub https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlQPRmtf9ULvEn-2FPlJ0wsbnfkS6Kl3JONNIYAVkDW-2FxAMeke2UHAp7F6vy-2FY5PReGFZf6AM8TteTWpN6d61eOgu8-3DbSDg_U31jt5HgHBh7k3Grc5s4lwAWgotqLcajnZb8qFlZQ4kRNE6F7nhM9tmSUy-2FZ-2FBGGeILmuhzShtTtSKGnl3ZGMjf7Ewg61vcga-2BOlyR8ej-2FkpzQEicYkpAI-2FiIcj47fSW3IAYcwB9-2Bh4b3J9wh-2F0NfIbAYgrRo5soQkDMWN-2BCh9RAhpPVQSvpdV6j2IiuXt0-2BYioyzQteVtye5KmPUeLPX3OdsKdV7ZtrGjtfTPbVEQvk6yBmlvESgsGhnWdnUlwy4bLwXHn3l2CU0KAwvu-2Bk2g-3D-3D, or unsubscribe https://u27784101.ct.sendgrid.net/ls/click?upn=aRgFq9XRJ4-2F-2B-2B6dqJgaVlThKVtke1jgecVWr5YV2DBEpfYGgjnAoQWi69Sc301gEgMm1Rixh8I0DqQOQlmiTETqPUyjSZmg1yoNESHUDTfe3igxyoCSPyH6vEJomXnCU01HiQ-2B4s4uZntclJtS19cg-3D-3DZe8g_U31jt5HgHBh7k3Grc5s4lwAWgotqLcajnZb8qFlZQ4kRNE6F7nhM9tmSUy-2FZ-2FBGGeILmuhzShtTtSKGnl3ZGMjf7Ewg61vcga-2BOlyR8ej-2FkpzQEicYkpAI-2FiIcj47fSWjWOFu4YWWg-2FzLMsE-2Fl0zoUnW3nHInVyvnjvnWKhSa9nRQP-2BN6XdOpNrlXpIe-2BHbehw5n2nj-2BwjJO0xeJmlDX-2FHGl5FTd8qcEGG-2FA73sbd08ssPAZzK7RLzbniz9wV58B1ovisaHIbQgsdsTZWt1hUw-3D-3D. You are receiving this because you were mentioned.Message ID: @.***>

roysjosh commented 1 year ago

Updated the PR

boatbod commented 1 year ago

Dev has been running stable for several days so I've merged and pushed these updates to master and gr310