Number of Subjects/Users for PAI

woodbe commented 5 years ago

I would expect, for consistency of results, that we expect a common number of subjects to be used to create the PAIs for each test. Some tests list only one, most list 5. I think from the ones I put together I took what had already been listed, and didn't make any edits in the individual files, but given that we are establishing a standard of 10 attempts with a PAI (unless there are failures), it would seem like we should also have one common number of subjects.

Do we really need 5 here, or is say 3 subjects with 10 tests each sufficient? I'm thinking about the overall time/cost since some of these have a lot of tests (I worry about the number we will have when we finally get the fingerprint stuff).

nils-tekampe commented 5 years ago

That is a very complex question. Some thoughts:

If we are talking one specific PAI, three subjects are totally fine for me
we have however descriptions in our toolboxes that describe mor classes of PAI. For those classes, I would then expect that the document specifies, how man. PAI per class shall be build (and then we have three subjects per PAI again)
For each PAI that has been built for a specific subject, I would suggest the following numbers for tests:

PAI is applied 10 times If one or less errors occur, test passes If three or more errors occur, test fails If two errors occur, PAI is applied another 10 times If it ends up with 2 errors in 20, test passes. If more errors occur, test fails

I would also suggest to have this identical for all toolboxes.

Last but not least: this is the perspective of functional testing. Vulnerability assessment would work completely different.

woodbe commented 5 years ago

OK, so then what I think we need to do is add something in each of the toolboxes that lists the number of expected PAI types (i.e. 1, 2, 3, etc) and then specify 3 subjects per PAI. So in the general document, we specify the three subjects per PAI.

If 3 are fine for you, are you OK with 2 (or even 1)? I just want to understand what you think the lower-bound would be for the number of different subjects for any single PAI type.

I'm definitely not talking about vulnerability (lalala, I can't hear you) here, just the PAD toolboxes.

n-kai commented 5 years ago

I am OK with 1 subject per PAI and each PAI is applied 10 times, but evaluator shall check the quality of PAI carefully (e.g. evaluator has to check that the printed vein pattern (i.e. PAI) has visible vein pattern). If you create lots of PAIs using 3 subjects and all of PAIs don't have visible vein pattern, it's waste of time to do testing using those PAIs.

I think that, for our cPP-Module, purpose of the PAD toolboxes testing (i.e. ATE_IND.1 testing) is a preliminary testing for vulnerability testing. If evaluator can create the PAI that can easily bypass the biometric verification, such TOE shall fail the evaluation without going to the AVA_VAN.1. However, if evaluator can find "gray zone" PAI (i.e. succeed the attack at low probability), evaluator should move to AVA_VAN.1 with those PAIs.

nils-tekampe commented 5 years ago

Sorry but we are again mixing up functional testing and vulnerability assessment here. I just wanted to make clear that my comment was only meant for the area of functional testing. I think that any penetration testing in the area of vulnerability assessment will need significantly higher numbers than what we discussed so far.

n-kai commented 5 years ago

6.3.1.4 in the SD proposes that "However, the evaluator shall not spend more than one week for independent and penetration testing, considering the assurance level claimed by [BIOcPP Module]" and doesn't introduce the minimum number of attempts for the penetration testing because we don't have any theoretical background for such minimum numbers and ISO/IEC 30107-3 (Biometric presentation attack detection — Part 3: Testing and reporting) doesn't also define such minimum number of attempts or tests but require the tester to report the number of attempts instead (See text in italic below).

So my proposal is that the toolbox should define

the "threshold used for APCER" for the independent or functional testing and penetration testing separately (e.g. if the PAI succeed once in 10 attempts during the independent or functional testing, the PAI should be tested again during the penetration testing and the TOE fails the test if the PAI meets the criterion in 9.3
the number of attempts for independent or functional testing (e.g. 10 attempts)
minimum time period for the penetration testing (e.g. one week) without introducing the minimum number of attempts or tests but require the evaluator to report the number of tests in ETR instead.

9.4 Iterative testing to identity effective artefacts Based on the creation, preparation, and usage considerations above, an evaluator could evaluate presentation attack instruments with a special effort on those found to be initially effective. The analysis could take place in two phases. After a first phase of tests (Note from @n-kai : this first phase corresponds to the independent or functional testing), the evaluator could test extensively every PAI misclassified as bona fide in a second phase of tests (Note from @n-kai : this second phase corresponds to the penetration testing). APCER (proportion of attack presentations using the same PAI species incorrectly classified as bona fide presentations at the PAD subsystem in a specific scenario) could then be measured for each selected PAI. If APCER exceeds a fixed threshold for one PAI species, the PAI would be deemed successful. The evaluator could report the number of tests done in the second phase, and the threshold used for APCER. A very stringent methodology would use a 0 % threshold for APCER, meaning every presentation attack which demonstrates capability to be misclassified at least two times is deemed successful, as the PAI already succeeded at least once in the first phase.

woodbe commented 5 years ago

@nils-tekampe I just want to confirm when you say penetration testing will need higher numbers, what needs to be higher? The number of subjects, the number of species created, the number of times a species is tested? It just isn't quite clear to me what would need to be higher (setting expectations for the penetration tests that way is probably useful, even if it is just guidelines for expectations as opposed to actual test plans like the toolbox).

woodbe commented 5 years ago

@n-kai I completely agree with the requirement that the PAI be "good" and that is probably something that could be added to the tests (such as the example you state, that the vein output needs to show veins or it can't be considered good), though it may be difficult to know what is good without a lot of experience in some (many) cases.

I think that limiting the number of subjects that are needed for the PAD, not to mention the sheer number of separate PAIs from those subjects is good. If we can agree to 1, I'm really happy, but if 2 is better, I'm still good (even 3 is ok, but I think that unless there is a very good reason to do so, we not go over that number in general, though if there is a specific need to do so, then we make an exception for that test).

n-kai commented 5 years ago

@woodbe it may be difficult to know what is good without a lot of experience in some (many) cases I agree with you. I believe that most of evaluators don't have such experience and nobody knows which number of subjects is best or enough. In this case, we should set the lowest number (i.e. 1 subject) but give evaluators freedom to do more (i.e. 2 or 3 subjects) within one week if he/she thinks it's necessary.

For example, in case of vein recognition, evaluator may gain the information how the TOE checks the liveliness from, for example, patent information (e.g. the TOE checks expansion and contraction of vein). In this case, one evaluator focuses on one PAI that has clear vein image and try various presentation methods to introduce some vibration that mimic such expansion and contraction. The other evaluators create two or three PAIs that have different level of clearness of vein image but reduce the number of presentation methods for each PAI. But nobody knows which test method is better because we don't know exactly how the TOE checks the liveliness (even if evaluators know such information, it's still difficult or impossible to determine which is better).

I know that this introduce subjectivity into testing but the SD can't determine all evaluation activities precisely. If the SD doesn't describe clear evaluation activities, it is the certification body that is responsible to make the final decision to determine the evaluations conform to CC/CEM with cPP/SD.

woodbe commented 5 years ago

Based on 5/30 call we will add a specific question to the public review invitation about the number of subjects that should be used for PAD. We will propose one, and see what people say.

Brian will update the toolbox.adoc file to add number of subjects and then add this to the list of Public Review documents.

woodbe commented 5 years ago

This was updated to 3 subjects based on discussion at the Singapore CCUF meetings. This was covered in #21.

Closing the issue.

biometricITC / cPP-toolboxes

Number of Subjects/Users for PAI #4