asterisk / asterisk

The official Asterisk Project repository.
https://www.asterisk.org
Other
1.99k stars 929 forks source link

Stir/Shaken Refactor #524

Closed gtjoseph closed 3 months ago

gtjoseph commented 4 months ago

Why do we need a refactor?

The original stir/shaken implementation was started over 3 years ago when little was understood about practical implementation. The result was an implementation that wouldn't actually interoperate with any other stir-shaken implementations.

There were also a number of stir-shaken features and RFC requirements that were never implemented such as TNAuthList certificate validation, sending Reason headers in SIP responses when verification failed but we wished to continue the call, and the ability to send Media Key(mky) grants in the Identity header when the call involved DTLS.

Finally, there were some performance concerns around outgoing calls and selection of the correct certificate and private key. The configuration was keyed by an arbitrary name which meant that for every outgoing call, we had to scan the entire list of configured TNs to find the correct cert to use. With only a few TNs configured, this wasn't an issue but if you have a thousand, it could be.

What's changed?

Resolves: #351 Resolves: #46

UserNote: Asterisk's stir-shaken feature has been refactored to correct interoperability, RFC compliance, and performance issues. See https://docs.asterisk.org/Deployment/STIR-SHAKEN for more information.

UpgradeNote: The stir-shaken refactor is a breaking change but since it's not working now we don't think it matters. The stir_shaken.conf file has changed significantly which means that existing ones WILL need to be changed. The stir_shaken.conf.sample file in configs/samples/ has quite a bit more information. This is also an ABI breaking change since some of the existing objects needed to be changed or removed, and new ones added. Additionally, if res_stir_shaken is enabled in menuselect, you'll need to either have the development package for libjwt v1.15.3 installed or use the --with-libjwt-bundled option with ./configure.

gtjoseph commented 4 months ago

cherry-pick-to: 18 cherry-pick-to: 20 cherry-pick-to: 21

gtjoseph commented 4 months ago

I'm moving this back to draft because setting any of the profile config fields will wither result in a segfault or the wrong structure member being set. I'm refactoring how the profiles work to prevent the duplication of fields and to more easily prevent disagreements.

gtjoseph commented 4 months ago

Still trying to decide what can be unit tested vs testsuite tested.

lukeescude commented 3 months ago

Super excited for this!!!!!! You guys rock!!

gtjoseph commented 3 months ago

Super excited for this!!!!!! You guys rock!!

Thanks @lukeescude! Would it be possible for you to test at all?

lukeescude commented 3 months ago

Super excited for this!!!!!! You guys rock!!

Thanks @lukeescude! Would it be possible for you to test at all?

Absolutely, do you happen to know where I can find the docs for configuring the module? (Not sure if it's still following the same syntax as the original stir_shaken.conf) - I don't mind looking at the code to extrapolate docs, just need to know where.

seanbright commented 3 months ago

https://docs.asterisk.org/Deployment/STIR-SHAKEN/

gtjoseph commented 3 months ago

Just did a one line fix in curl_utils:curler() to fix a crash when curl_easy_perform() returns something other than CURLE_OK. curl_easy_cleanup() was being called twice.

lukeescude commented 3 months ago

I can successfully build, but asterisk encounters a segfault when booting - the last few messages of the verbose boot are:

Loading plc. == plc => (PLC) Loading udptl. == udptl => (UDPTL) Loading codec_opus.so. == Updated cached format with name 'opus' Segmentation fault (core dumped)

Since Opus is the last thing to be loaded, do you think that's causing an issue?

EDIT: Recompiling with DONT_OPTIMIZE so I can grab some core dump details.

lukeescude commented 3 months ago

stir_shaken_core_dump.zip

Here is the core dump. I will disable building Opus and see if that helps.

InterLinked1 commented 3 months ago

I will disable building Opus and see if that helps.

You can also just do noload => codec_opus to see if it crashes without loading that.

lukeescude commented 3 months ago

Ah good idea!

Yeah, so both Opus and g729 are borked - I have to either disable them or delete them entirely from the modules directory to allow asterisk to load.

It's also highly probable that I am not checking out 20.6.0 and applying the PR to it properly - Working with tags and PRs is not something I'm familiar with in git.

InterLinked1 commented 3 months ago

Ah good idea!

Yeah, so both Opus and g729 are borked - I have to either disable them or delete them entirely from the modules directory to allow asterisk to load.

It's also highly probable that I am not checking out 20.6.0 and applying the PR to it properly - Working with tags and PRs is not something I'm familiar with in git.

That might be because the versions you had are not ABI compatible with your current build configuration. I have no experience with those modules specifically, but I seem to recall them being proprietary (no source released) modules that are downloaded from Digium, although I would expect the build sum to mismatch when the module tries to load, triggering a reject. Maybe somebody else knows more about that...

gtjoseph commented 3 months ago

@lukeescude oops, I missed your comments last week, sorry. As @InterLinked1 mentioned, it's a mismatch. The best way to try the PR on a branch other than master is to download it as a patchfile and apply it to the branch you want...

$ git checkout 20.6.0
$ curl -L https://github.com/asterisk/asterisk/pull/524.patch > ../stir.patch
$ patch -p1 < ../stir.patch

Then just configure, compile and install as you normally would. I can confirm that both codec_opus and codec_g729a load and work fine this way.

lukeescude commented 3 months ago

Alright, making progress! Looks like some kind of syntax error:

#15 395.9 res_stir_shaken/verification.c: In function 'check_date_header':
#15 395.9 res_stir_shaken/verification.c:673:9: error: expected ';' before 'if'
#15 395.9   673 |         if (time_diff < 0) {
#15 395.9       |         ^~
#15 395.9 res_stir_shaken/verification.c:678:11: error: 'else' without a previous 'if'
#15 395.9   678 |         } else if (time_diff > (ctx->eprofile->vcfg_common.max_date_header_age * 1000)) {
#15 395.9       |           ^~~~
#15 395.9 make[1]: *** [/tmp/asterisk-20.6.0/Makefile.rules:165: res_stir_shaken/verification.o] Error 1
#15 395.9 make: *** [Makefile:396: res] Error 2
#15 397.4    [CC] res_stir_shaken/verification.c -> res_stir_shaken/verification.o
#15 397.5 res_stir_shaken/verification.c: In function 'check_date_header':
#15 397.5 res_stir_shaken/verification.c:673:9: error: expected ';' before 'if'
#15 397.5   673 |         if (time_diff < 0) {
#15 397.5       |         ^~
#15 397.5 res_stir_shaken/verification.c:678:11: error: 'else' without a previous 'if'
#15 397.5   678 |         } else if (time_diff > (ctx->eprofile->vcfg_common.max_date_header_age * 1000)) {
#15 397.5       |           ^~~~
#15 397.6 make[1]: *** [/tmp/asterisk-20.6.0/Makefile.rules:165: res_stir_shaken/verification.o] Error 1
#15 397.6 make: *** [Makefile:396: res] Error 2
lukeescude commented 3 months ago

@gtjoseph looking at the code, I am actually not sure why I'm seeing those syntax errors - It's been a while since I've written C but your code looks fine to me. The patch successfully applied too, or at least it gave no errors.

I'll keep digging. We have an automatic build process that may be getting in the way so I'll decouple it from our internal systems and build on my machine locally step-by-step to see if I can uncover what's happening.

I did have to add --with-bundled-libjwt to configure, and additionally had to add pkg-config & libjansson-dev packages before compiling too.

InterLinked1 commented 3 months ago

@gtjoseph looking at the code, I am actually not sure why I'm seeing those syntax errors - It's been a while since I've written C but your code looks fine to me. The patch successfully applied too, or at least it gave no errors.

I'll keep digging. We have an automatic build process that may be getting in the way so I'll decouple it from our internal systems and build on my machine locally step-by-step to see if I can uncover what's happening.

I did have to add --with-bundled-libjwt to configure, and additionally had to add pkg-config & libjansson-dev packages before compiling too.

It could be that the patch applied, but in a funny way where it messed some things up. I've seen that happen a few times in the past, where I've had to go in and manually fix the code back in. Happens a lot with Asterisk actually when applying unmerged patches, particularly if they conflict with one another - sometimes it'll just refuse to do anything but other times it will get confused. You might want to see if everything looks good around line 672 in your source.

seanbright commented 3 months ago

You will either need to apply the fix that I suggest above, or compile in dev mode:

./configure --enable-dev-mode --with-libjwt-bundled 

(ast_trace(...) in dev mode does not need a trailing ;, but if it's not in dev mode it does - that should probably be fixed at some point)

gtjoseph commented 3 months ago

I did have to add --with-bundled-libjwt to configure, and additionally had to add pkg-config & libjansson-dev packages before compiling too.

I understand the need for --with-libjwt-bundled and will add that to the UpgradeNote. If you're not using --with-jansson-bundled then I can understand the need for libjansson-dev but that would have been required for an asterisk build even before stir-shaken. If you are using --with-jansson-bundled, then the libjansson-dev package should not have been needed at all. I'll have to investigate that a bit.

gtjoseph commented 3 months ago

The latest force-push contains:

lukeescude commented 3 months ago

Alright @gtjoseph and @seanbright making progress - got it compiled, but module will not load with the following error:

[Feb 16 20:02:49] DEBUG[345]: loader.c:1834 ast_load_resource: Module res_stir_shaken previously declined to load, unloading it first before loading again
[Feb 16 20:02:49] NOTICE[345]: loader.c:1241 ast_unload_resource: Unloading module 'res_stir_shaken' that previously declined to load
[Feb 16 20:02:49] ERROR[345]: loader.c:964 logged_dlclose: Failure in dlclose for module 'res_stir_shaken': /usr/lib/asterisk/modules/res_stir_shaken.so: shared object not open
[Feb 16 20:02:49] DEBUG[345]: config.c:2377 config_text_file_load: Parsing /etc/asterisk/stir_shaken.conf
[Feb 16 20:02:49] ERROR[345]: res_stir_shaken/crypto_utils.c:176 crypto_register_x509_extension: Couldn't register TNAuthList X509 extension: C00674CEBD7F0000:error:04000066:object identifier routines:OBJ_create:oid exists:../crypto/objects/obj_dat.c:719:
gtjoseph commented 3 months ago

Oh, libjwt 1.7.0 was just released last week but I haven't had a chance to test it yet.

gtjoseph commented 3 months ago

Ah crap. Frickin Alembic.

gtjoseph commented 3 months ago

Alright @gtjoseph and @seanbright making progress - got it compiled, but module will not load with the following error:

[Feb 16 20:02:49] DEBUG[345]: loader.c:1834 ast_load_resource: Module res_stir_shaken previously declined to load, unloading it first before loading again
[Feb 16 20:02:49] NOTICE[345]: loader.c:1241 ast_unload_resource: Unloading module 'res_stir_shaken' that previously declined to load
[Feb 16 20:02:49] ERROR[345]: loader.c:964 logged_dlclose: Failure in dlclose for module 'res_stir_shaken': /usr/lib/asterisk/modules/res_stir_shaken.so: shared object not open
[Feb 16 20:02:49] DEBUG[345]: config.c:2377 config_text_file_load: Parsing /etc/asterisk/stir_shaken.conf
[Feb 16 20:02:49] ERROR[345]: res_stir_shaken/crypto_utils.c:176 crypto_register_x509_extension: Couldn't register TNAuthList X509 extension: C00674CEBD7F0000:error:04000066:object identifier routines:OBJ_create:oid exists:../crypto/objects/obj_dat.c:719:

Whoa. "oid exists"??? What version of openssl are you using?

lukeescude commented 3 months ago

Looks like:

OpenSSL 3.0.11 19 Sep 2023 (Library: OpenSSL 3.0.11 19 Sep 2023)

gtjoseph commented 3 months ago

I'm at 3.1.1 and that oid (TnAuthList: "1.3.6.1.5.5.7.1.26") is not pre-built into openssl. The module isn't somehow being loaded twice is it? In modules.conf, do you have autoload set to yes? Is res_stir_shaken.so listed in the file? Anything else in the logs that may indicate it's being loaded twice?

lukeescude commented 3 months ago

res_stir_shaken.so STIR/SHAKEN Module for Asterisk 0 Not Running core

I've got autoload set to yes, and res_stir_shaken is not in the file at all, so it should be loading without issue.

module unload/load doesn't seem to make a difference.

I will can look at upgrading openssl to newest. Building on debian 12.

seanbright commented 3 months ago

I will can look at upgrading openssl to newest. Building on debian 12.

I believe that would be unnecessary. Can you upload a full log?

gtjoseph commented 3 months ago

I will can look at upgrading openssl to newest. Building on debian 12.

I believe that would be unnecessary. Can you upload a full log?

Yeah, don't upgrade. There's something else going on. What you can do is add...

noload = res_pjsip_stir_shaken.so
noload = res_stir_shaken.so

to modules.conf, restart asterisk, then from the asterisk CLI...

core set trace 5 res_stir_shaken.so
core set debug 5 res_stir_shaken.so
module load res_stir_shaken.so

and post the result.

NOTE: You can't unload res_stir_shaken and load it again. You can only reload. The reason is that once you register an OID to openssl, you can't unregister it so if you try unload the module and load it again, you'll get the exact error message you got.

InterLinked1 commented 3 months ago

NOTE: You can't unload res_stir_shaken and load it again. You can only reload. The reason is that once you register an OID to openssl, you can't unregister it so if you try unload the module and load it again, you'll get the exact error message you got.

Then shouldn't the unload handler in the module always return -1, to prevent people from doing that? (We did something similar with a few of the PJSIP modules that had similar underlying limitations).

gtjoseph commented 3 months ago

Then shouldn't the unload handler in the module always return -1, to prevent people from doing that? (We did something similar with a few of the PJSIP modules that had similar underlying limitations).

Yeah probably but I do want the module to clean up other things. There's a lot going on and being able to use the leak sanitizer has been very helpful.

Actually, the check to see if the OID is already registered is flawed. I can fix that and just not try to register it again if it already exists.

Doesn't explain why @lukeescude is getting the message though.

lukeescude commented 3 months ago

Alright here we go:

874790*CLI> core set trace 5 res_stir_shaken.so
Core trace was 0 and has been set to 5 for 'res_stir_shaken'.
874790*CLI> core set debug 5 res_stir_shaken.so
Core debug was 0 and has been set to 5 for 'res_stir_shaken'.
874790*CLI> module load res_stir_shaken.so
Unable to load module res_stir_shaken.so
Command 'module load res_stir_shaken.so' failed.
[Feb 16 21:46:11] NOTICE[5722]: res_stir_shaken/crypto_utils.c:179 crypto_register_x509_extension: Registering TNAuthList NID 1248
[Feb 16 21:46:11] DEBUG[5722]: res_stir_shaken/common_config.c:313 common_config_load:  Stir Shaken Load
[Feb 16 21:46:11] ERROR[5722]: res_stir_shaken/verification_config.c:64 vs_is_config_loaded: No 'verification' section could be found in stir/shaken configuration
[Feb 16 21:46:11] ERROR[5722]: res_stir_shaken/verification.c:1025 vs_load: stir/shaken verification service failed to load
[Feb 16 21:46:11] DEBUG[5722]: res_stir_shaken/common_config.c:322 common_config_load:  Stir Shaken VS load failed

I suppose a verification section is required then, so I will get that written even though we don't use it, and see if that helps.

lukeescude commented 3 months ago

@gtjoseph so it looks like the module required this directory to exist:

/var/lib/asterisk/keys/stir_shaken/cache

And additionally adding:

[verification] global_disable = yes

Helped too, but then it's throwing an error about there being no [attestation] object even though there is one.

gtjoseph commented 3 months ago

Helped too, but then it's throwing an error about there being no [attestation] object even though there is one.

Attach the actual log message and your stir_shaken.conf file.

gtjoseph commented 3 months ago

The latest force push removed the internal vector that kept track of registered x509 extensions and instead uses the openssl OBJ_sn2nid and OBJ_nid2sn functions to determine if the extension is already registered. You can now safely unload res_stir_shaken and load it again without issues.

lukeescude commented 3 months ago

@gtjoseph see below:

874790*CLI> core set trace 5 res_stir_shaken.so
874790*CLI> core set debug 5 res_stir_shaken.so
Core trace was 0 and has been set to 5 for 'res_stir_shaken'.
Core debug was 0 and has been set to 5 for 'res_stir_shaken'.
874790*CLI> module load res_stir_shaken.so
Unable to load module res_stir_shaken.so
Command 'module load res_stir_shaken.so' failed.
[Feb 16 23:00:18] NOTICE[10391]: res_stir_shaken/crypto_utils.c:179 crypto_register_x509_extension: Registering TNAuthList NID 1248
[Feb 16 23:00:18] DEBUG[10391]: res_stir_shaken/common_config.c:313 common_config_load:  Stir Shaken Load
[Feb 16 23:00:18] DEBUG[10391]: res_stir_shaken/verification_config.c:162 vs_check_common_config:  verification: Checking common config
[Feb 16 23:00:18] DEBUG[10391]: res_stir_shaken/verification_config.c:264 vs_check_common_config:  verification: Done
[Feb 16 23:00:18] ERROR[10391]: res_stir_shaken/attestation_config.c:52 as_is_config_loaded: No 'attestation' section could be found in stir/shaken configuration
[Feb 16 23:00:18] ERROR[10391]: res_stir_shaken/attestation.c:444 as_load: stir/shaken attestation service failed to load
[Feb 16 23:00:18] DEBUG[10391]: res_stir_shaken/common_config.c:327 common_config_load:  Stir Shaken AS load failed

Contents of stir_shaken.conf:

[attestation]
global_disable = no
private_key_path = /etc/asterisk/stir_shaken/private_key.pem
public_cert_url = https://pvx1.s3.us-east-2.amazonaws.com/stirshaken/8448cc7eb8424d6ad5d2e6d71bcf6629.cer
attest_level = A

[attest_profile]
type = profile
endpoint_behavior = attest

[verification]
global_disable = yes

[19997379664]
type = tn
attest_level = A

[14693934979]
type = tn
attest_level = A

[19996528348]
type = tn
attest_level = A

[15732069259]
type = tn
attest_level = A

[19996699142]
type = tn
attest_level = A

[15732069268]
type = tn
attest_level = A

[19993056391]
type = tn
attest_level = A

[15732069280]
type = tn
attest_level = A

I realize those 999 numbers aren't real, I'll fix that in a bit.

seanbright commented 3 months ago

Not 100% but you may be missing:

type = attestation

Edit: nope. I’m wrong.

lukeescude commented 3 months ago

Ahhh figured it out - I need to use private_key_file, not private_key_path... not sure where I got path from.

My last module error to fix is:

attest_profile: Neither this profile nor default verification options specify ca_file or ca_path

and I think it'll be fully functional.

lukeescude commented 3 months ago

@gtjoseph @seanbright BOOM! All good - calls are being attested properly.

Here's my functioning stir_shaken.conf:

[attestation]
global_disable = no
private_key_file = /etc/asterisk/stir_shaken/private_key.pem
public_cert_url = https://pvx1.s3.us-east-2.amazonaws.com/stirshaken/8448cc7eb8424d6ad5d2e6d71bcf6629.cer
attest_level = A

[attest_profile]
type = profile
endpoint_behavior = attest
ca_path = /etc/ssl/certs/

[verification]
global_disable = yes

[19997379664]
type = tn
attest_level = A

[14693934979]
type = tn
attest_level = A

I had to set ca_path = /etc/ssl/certs/.

gtjoseph commented 3 months ago

@lukeescude That's good news! I'll make the config check just disable verification if there's no [verification] section in the config file. Same with [attestation]. This will also prevent you from having to specify a ca_* option in a profile.

BTW, I attempted to download the certificate revocation list from the iconnectiv URI specified in your certificate's "X509v3 CRL Distribution Points" extension and it failed. Do you know if it's supposed to work? Do you happen to have a contact at iconnectiv that could give us more info on it?

gtjoseph commented 3 months ago

@lukeescude You don't happen to have the CA certificate or cert chain that was used to issue your certificate do you? If so, can you share it? We're sorely lacking in real-world examples to test with.

gtjoseph commented 3 months ago

Changes in latest force push...

lukeescude commented 3 months ago

Hey @gtjoseph I'm not ignoring you, we're coming up on an RFP deadline to offer phone services to an entire school district so we're all-hands-on-deck 😄

I'll dig into my certs and get you everything I can possibly get.

Off the bat, the Iconective (Neustar) CA list is available on a JSON endpoint, but it's firewall-protected, so I have to use our VPN to get it:

<snipped>

I don't know how to read that, nor does it look like a CA thing to me...

I also have the original CSR, our private key, and something called the SPC Token Output if you'd like those.

gtjoseph commented 3 months ago

Hey @lukeescude No worries!! Thanks for the info. I think we're good for now.

lukeescude commented 3 months ago

Your newest commits are working perfectly by the way.

gtjoseph commented 3 months ago

Just FYI. There are no more changes coming other than to address review comments.

github-actions[bot] commented 3 months ago

Successfully merged to branch master and cherry-picked to ["18","20","21"]