data61 / anonlink-entity-service

Privacy Preserving Record Linkage Service
Apache License 2.0
26 stars 8 forks source link

ValueError when running test with permutation and encrypted mask #95

Closed unzvfu closed 6 years ago

unzvfu commented 6 years ago

Normal build; start with

docker-compose -p n1es -f tools/docker-compose.yml up

Run

docker run -it -e ENTITY_SERVICE_URL=http://localhost:32768/api/v1 -e ENTITY_SERVICE_TEST_SIZE=100 -e ENTITY_SERVICE_PERMUTATION=1 --net=host quay.io/n1analytics/entity-app python test_service.py

gives

[...]
00:52:14.588   n1.permutationtest       INFO - 81 -> 0 (87, 'Shirl Rumford', '1954/01/24', 'F')
00:52:14.588   n1.permutationtest       INFO - 96 -> 3 (88, 'Maria Montealegre', '1933/01/16', 'M')
00:52:14.588   n1.permutationtest       INFO - Decrypting mask used for first 10 entities...
Traceback (most recent call last):
  File "test_service.py", line 823, in <module>
    permutation_test(party1_filters[:size], party2_filters[:size], s1[:size], s2[:size])
  File "test_service.py", line 535, in permutation_test
    encrypted_mask = [paillier.EncryptedNumber(pub, phe.util.base64_to_int(m)) for m in mapping_result_b['mask'][:10]]
  File "test_service.py", line 535, in <listcomp>
    encrypted_mask = [paillier.EncryptedNumber(pub, phe.util.base64_to_int(m)) for m in mapping_result_b['mask'][:10]]
  File "/usr/local/lib/python3.6/site-packages/phe/util.py", line 146, in base64_to_int
    return int(hexlify(base64url_decode(source)), 16)
  File "/usr/local/lib/python3.6/site-packages/phe/util.py", line 141, in base64url_decode
    raise ValueError('Invalid base64 string')
ValueError: Invalid base64 string
unzvfu commented 6 years ago

Problem goes away after increasing ENTITY_SERVICE_TEST_SIZE from 100 to 1000 (note the Jenkins run---which passes---uses 5000).

Brian says: "Having a quick look through there are a few more if branches based on mapping size than I remember. size < config.GREEDY_SIZE, and expected_size < config.ENTITY_CACHE_THRESHOLD."

unzvfu commented 6 years ago

This error appears only for TEST_SIZE <= 100; everything works for TEST_SIZE >= 101 as far as I can tell.

unzvfu commented 6 years ago

This error is caused by the call to celery.chord() in async_worker.py:paillier_encrypt_mask().

Specifically: Normally the results from the generator which is passed as the first argument to celery.chord() (header) are given to the second argument (callback) as a list. Indeed this is what we observe when the header generates at least two values. However, for reasons yet to be divined, when the header generates only a single element, then that element alone is passed to the callback. This is a type error of sorts, since the callback expects a list of thingies, but in this second case it is given the thingy itself rather than a singleton list containing the thingy.

This error is triggered because the value config.ENCRYPTION_CHUNK_SIZE is 100, so when the size of the dataset is 100 or less, then only one chunk is generated by the header and so the type error described above occurs.

unzvfu commented 6 years ago

Turns out (thanks @hardbyte!) that this is (was) a known issue in Celery. The fix is due to appear in Celery 4.2.