Closed mpg closed 1 year ago
The test serves a purpose: checking that mbedtls_psa_get_random and MBEDTLS_PSA_RANDOM_STATE can be used with legacy APIs that accept f_rng, p_rng parameters. So, we may not want to just remove that test.
Indeed.
We could also just accept the testing disparity as harmless. That would mean leaving the ignore entry for test_suite_random in analyze_outcomes.py, with a comment explaining why it's harmless.
This has my preference. The cost is minimal. Unless we really expect to have no other disparities, which would save us from supporting an exception mechanism? Is it hard to support exceptions?
We could instead move the test to test_suite_ecdsa which is legitimately ignored already. This is not really satisfying, as the test logically belongs to test_suite_random.
Indeed. It's testing mbedtls_psa_get_random
, and the involvement of ECDSA is anecdotic.
We could also use something other than ECDSA in the role of "legacy API that takes f_rng, p_rng"
That would be fine. It should be something where all-bits-zero (a plausible failure mode) causes a failure (which is the case for both ECC blinding and randomized ECDSA k generation, and also for DHM blinding, and for RSA key generation but it's very slow, but I'm not sure about RSA blinding, and it's not the case for RSA encryption nonce or RSA padding).
We could also just accept the testing disparity as harmless. That would mean leaving the ignore entry for test_suite_random in analyze_outcomes.py, with a comment explaining why it's harmless.
This has my preference. The cost is minimal. Unless we really expect to have no other disparities, which would save us from supporting an exception mechanism? Is it hard to support exceptions?
We already have support or test-suite-level exceptions and we need it because we always want to ignore the test suite of the thing that's being accelerated: for example when ECDSA is accelerated clearly we expect the tests in test_suite_ecdsa
to be skipped. (Support for exceptions is also super useful in the development process, because we can start by adding the tests we want, but more exceptions than warranted, then gradually work towards removing them, see #6855 and its follow-ups - you could call it test-driven task breakdown.)
What we don't have, is test-case-level exceptions where we could say it's OK for this case in particular to be skipped. But here I think it's OK to have an exception for the full test_suite_random
anyway.
That would be fine. It should be something where all-bits-zero (a plausible failure mode) causes a failure (which is the case for both ECC blinding and randomized ECDSA k generation, and also for DHM blinding, and for RSA key generation but it's very slow, but I'm not sure about RSA blinding, and it's not the case for RSA encryption nonce or RSA padding).
Good point about RSA being slow (as well as DHM parameter generation).
But here I think it's OK to have an exception for the full test_suite_random anyway.
I disagree. That would be problematic start working on PSA random driver support.
Should have made a complete sentence: I think it's OK to have an exception for the full test_suite_random
in driver-only ECC builds anyway. Exceptions are per pair all.sh
components to compare. Obviously in builds focused on PSA random drivers that wouldn't be OK.
Anyway, I've been thinking and at some point we'll probably want to expand exceptions support to target specific test cases anyway, and I don't think it would be that hard. (For example at some point in PK parse we'll want to specifically ignore test cases that rely on compressed points, but definitely not ignore the full suite.) So, we might as well do that extension now and ignore only that specific test case here too.
I'm not an expert at all, but since I'm currently trying to address this issue I just give my 2cents.
At first I was more OK with this solution:
We could also use something other than ECDSA in the role of "legacy API that takes f_rng, p_rng". [...] But we could use RSA or DHM instead.
However I see that there are concerns about it as pointed out here.
As a consequence my question goes to this sentence:
I've been thinking and at some point we'll probably want to expand exceptions support to target specific test cases anyway, and I don't think it would be that hard
So, if I understood correctly, this means that I should extend analyze_outcomes.py
in order to allow exceptions for specific test inside a certain suite. Am I right?
If this is the case, then:
csv
file, the only option I see for adding an additional "filter" is to consider the string being printed on the test, such as PSA classic wrapper: ECDSA signature (SECP256R1)
correct?A tentative change to analyze_outcomes.py
was proposed in the connected PR at this commit
So, if I understood correctly, this means that I should extend analyze_outcomes.py in order to allow exceptions for specific test inside a certain suite. Am I right?
Yes.
can I do this directly in this PR to make things faster?
It'll only make things faster if everyone is happy with the design.
I don't think including test case descriptions in the script will work well. Someone might tweak the description (e.g. fix a typo, or make it more distinctive when adding a similar test case), and there's basically no way to know the test case is referenced elsewhere. I'm not even overjoyed at hard-coding test suite names.
I'd prefer a mechanism where the test suite/case itself has an annotation that says THIS_TEST_IS_EXCLUDED_FROM_DRIVER_TEST_COVERAGE_PARITY
. How about adding this as a “fake” depency to depends_on
declarations? This requires no change to the test framework and has a familiar interface. The missing piece is for the Python code to understand test suite dependencies and test case dependencies, which can be done by leveraging some of the parsing that's already implemented in generate_test_cases.py
.
Note — to smoothe development, we can have the simple mechanism with a hard-coded name in this PR, and then make a follow-up with a cleaner mechanism.
It'll only make things faster if everyone is happy with the design.
Yes, of course ;). I didn't want to sound disrespectful here, I just wanted to say that I can take care of this directly in this PR without opening another one just to fix this part
Someone might tweak the description (e.g. fix a typo, or make it more distinctive when adding a similar test case), and there's basically no way to know the test case is referenced elsewhere
You're right, the proposed solution is not super safe
How about adding this as a “fake” depency to depends_on declarations?
I agree, this looks like a better approach. I'll give it a try!
@gilles-peskine-arm
I don't think including test case descriptions in the script will work well. Someone might tweak the description (e.g. fix a typo, or make it more distinctive when adding a similar test case), and there's basically no way to know the test case is referenced elsewhere. I'm not even overjoyed at hard-coding test suite names.
As a matter of principle I'm not a huge fan of referring to test cases by their description, as you say someone might tweak it, but then what would happen? outcome_analysis.py
will not see the new description in its ignore list, so it will not ignore it, notice a disparity in coverage, exit non-zero, and the CI will fail. So we're bound to notice and fix it.
So, while referring to things by name is not great in general, I think the special case of an ignore list in a script run in the CI is actually quite OK. The worst that can happen is we waste a push-wait-for-CI-rework cycle on this. I don't think we tune descriptions often enough that the amount of cycles wasted would be unacceptable.
I'd prefer a mechanism where the test suite/case itself has an annotation that says
THIS_TEST_IS_EXCLUDED_FROM_DRIVER_TEST_COVERAGE_PARITY
.
I see two problems with this. First, the annotation needs to be more precise: we don't want to exclude this test form parity checking when testing accelerated hashes from example. This is of course trivial to fix by appending _FOR_ECDSA
(or similar) to the name used.
The second problem is related, and involves a plan I had in my head but hadn't written so far: in the long run I think we can save CI resources by having a single "reference" component that would be common to all "accelerated" components. Currently this is not possible the reference component for accelerated hashes disables the entropy module and uses an external RNG, while the ECDSA reference component disables ECDSA-based key exchanges, but once hashes are improved and ECDSA has progressed enough, both should be able to use the same reference component, which would be just full config + drivers enabled. But this doesn't work as well if we skip test cases in the reference component. Though that's of course not a problem now, so we could use a simpler approach now and worry about that later.
@valeriosetti quoting from https://github.com/Mbed-TLS/mbedtls/pull/6946#issuecomment-1399996999
instead of adding a new possible option for
depends_on
in the .data file, which didn't look very good to me, I preferred to add a new "line" to the .data file calledtest_options
I'm not sure why re-using the existing depends_on
line doesn't look good, can you expand on that? Your solution adds a bit of complexity, and it's not immediately clear to me what justifies this added complexity.
I'm not sure why re-using the existing depends_on line doesn't look good, can you expand on that?
In my opinion a "dependency" is something that is needed for building or running the library/test-code. In this case we would like to skip a test because it can be built/tested in one case, but not in the other, so the cross-check does not work properly. However the single build, considered independently, is OK. Besides this new symbol is not something defined in some header file, but something that makes sense only for the test purpose.
However this just my interpretation, of course, so I'm open to any different approach ;)
First, the annotation needs to be more precise: we don't want to exclude this test form parity checking when testing accelerated hashes from example.
Does that really matter? The test case will have a more specific dependency on some non-PSA symbol. The extra symbol to exclude from parity checking is just there to say “it's not a mistake to have a non-PSA dependency”.
we can save CI resources by having a single "reference" component that would be common to all "accelerated" components. (…) But this doesn't work as well if we skip test cases in the reference component.
I don't understand this part. What wouldn't work with an everything-through-driver component?
First, the annotation needs to be more precise: we don't want to exclude this test form parity checking when testing accelerated hashes from example.
Does that really matter? The test case will have a more specific dependency on some non-PSA symbol. The extra symbol to exclude from parity checking is just there to say “it's not a mistake to have a non-PSA dependency”.
I don't think the extra symbol is here just for that. We already have a non-PSA dependency indeed and it's not a mistake indeed. But what analyze_outcomes.py
is doing is comparing the sets of test skipped in a build with ECDSA_C
vs in a build without ECDSA_C
(but ECDSA provided by a driver). Currently, it sees a test skipped in the second build but not the first, so it complains, that's its job. We want to tell the script "when comparing those two builds, don't worry about this case", because we've manually checked that this disparity is not an indication of anything wrong in ECDSA driver support.
But when the script is comparing "hashes in software" vs "accelerated hashes" build, if the test mbedtls_psa_get_random_ecdsa_sign
is skipped in the accelerated build and not the software build, I want the script to let me know because that's definitely not expected, so likely an indication of a problem somewhere.
Basically the set of expected & acceptable disparities depends on the pair of builds being compared.
On another front, what do you think about my point that referring to a case by its description would be acceptable in the specific case of an ignore list, since any change in the description would be noticed as a result of the script failing?
The implementation of driver parity testing feels more complex than I'd expect. In particular, I wonder why “no non-PSA dependencies in the PSA domain (except explicit exceptions)” is not a good enough approximation. Is there a design analysis document that lists the requirements and derives the architecture?
I don't think the extra symbol is here just for that. We already have a non-PSA dependency indeed and it's not a mistake indeed.
Hmmm. A test is skipped only due to a dependency or to a runtime decision to skip. For a test to not run with drivers, one of the following has to happen:
I thought the goal of the test parity check was primarily (2), since (1) could be enforced through simple dependency checks and (3) is rare enough to not really matter. So for (2), is it really important which driver-based build gets the test to run? Especially when we can do all-driver builds (as opposed to separating algorithms due to incomplete support): wouldn't this reduce (2) to an analysis of test case dependencies?
Basically the set of expected & acceptable disparities depends on the pair of builds being compared.
Well, ok, that's how it's currently implemented. But why do we need this level of precision?
what do you think about my point that referring to a case by its description would be acceptable in the specific case of an ignore list, since any change in the description would be noticed as a result of the script failing?
That's not great, but it's acceptable.
Context: https://github.com/Mbed-TLS/mbedtls/issues/6839.
This task is to finish off TL-ecdsa.PSA by achieving testing parity in "basic" test suites (that is, everything except PK, X.509, TLS). Ideally, once this is done, in
analyze_outcomes.py
, the taskanalyze_driver_vs_reference_ecdsa
will no longer ignore test suites other than PK, X.509, TLS.test_suite_ecp
, the functionmbedtls_ecp_group_metadata
depends onECDSA_C
(andECDH_C
). This is because it testsmbedtls_ecdsa_can_do()
(andmbedtls_ecdh_can_do()
). But it also tests lots of other things. Instead of having the entire test function depend on ECDSA (and ECDH), only the specific line that depends on it should be guarded by an#if defined
.test_suite_psa_crypto_se_driver_hal
, several test cases using the functionsign_verify
declare a dependency onMBEDTLS_ECDSA_C:MBEDTLS_ECP_C:PSA_WANT_ECC_SECP_R1_256
, while they actually only usepsa_
APIs. The dependency declaration should probably bePSA_WANT_ALG_ECDSA:PSA_WANT_ECC_SECP_R1_256
.test_suite_random
, the functionmbedtls_psa_get_random_ecdsa_sign
depends onECDSA_C
. This seems correct, but is annoying because it doesn't play nice with out test disparity analysis. Find what to do.Random ideas about
mbedtls_psa_get_random_ecdsa_sign
:mbedtls_psa_get_random
andMBEDTLS_PSA_RANDOM_STATE
can be used with legacy APIs that acceptf_rng, p_rng
parameters. So, we may not want to just remove that test. (Historical note: the test was added by #4110.)test_suite_random
inanalyze_outcomes.py
, with a comment explaining why it's harmless.test_suite_ecdsa
which is legitimately ignored already. This is not really satisfying, as the test logically belongs totest_suite_random
.f_rng, p_rng
". Changing to ECDH, ECJPAKE or ECP or BIGNUM would only make us run into the same problem in future steps. But we could use RSA or DHM instead. In principle, it's also only moving the problem but it's unclear at this point if we're going to do driver-only RSA/DHM the way we're currently doing driver-only ECC. Or if it happens, it might be only after 4.0 where we'll removef_rng, p_rng
params from public APIs (always use the PSA RNG) sombedtls_psa_get_random
will be gone and this test with it.Depends on: https://github.com/Mbed-TLS/mbedtls/pull/6855