eu-federation-gateway-service / efgs-federation-gateway

The goal of this project is to develop the official European solution for the interoperability between national backend servers of decentralised contact tracing applications to combat COVID-19.
Apache License 2.0
59 stars 25 forks source link

Duplicate/existing key in upload causes database rollback of entire batch (but new keys erroneously reported as inserted "201") #269

Closed UBaggeler closed 3 years ago

UBaggeler commented 3 years ago

Describe the bug

When a national health backend uploads a certain key for the second time (in a new batch together with other new keys), EFGS throws/returns an HTTP 207 with payload:

{"409":[0], "201":[1,2,...], 500:[]}

This tells the national health backend that the first key was a duplicate (not inserted) but all the other keys were successfully accepted/inserted ("201"). But in reality, the whole MySQL transaction was rolled back and no keys were inserted at all. This might cause that keys are getting lost.

Expected behaviour

If we consider the output as expected (report of how many inserted, how many duplicates and how many failed), the database transaction should not be rolled back.

Steps to reproduce the issue

  1. National health backend uploads key K0,K1,K2 with BatchTag X -> EFGS successfully inserted K0,K1,K2
  2. National health backend uploads key K0,K3,K4,... with BatchTag Y
  3. Observe above error ({"409":[0], "201":[1,2,...], 500:[]}) while K3 and K4 were not inserted into DB

Technical details

https://github.com/eu-federation-gateway-service/efgs-federation-gateway/blob/master/src/main/java/eu/interop/federationgateway/service/DiagnosisKeyEntityService.java#L71-L103

@Transactional(rollbackOn = DiagnosisKeyInsertException.class)
  public void saveDiagnosisKeyEntities(
    List<DiagnosisKeyEntity> diagnosisKeyEntities
  ) throws DiagnosisKeyInsertException {

    [...]

    if (!resultMap.get(409).isEmpty() || !resultMap.get(500).isEmpty()) {

      EfgsMdc.put("insertedKeyCount", resultMap.get(201).size());
      EfgsMdc.put("conflictKeysCount", resultMap.get(409).size());
      EfgsMdc.put("failedKeysCount", resultMap.get(500).size());

      log.error("error inserting keys");
      throw new DiagnosisKeyInsertException("Error during insertion of diagnosis keys!", resultMap);
    }

Possible Fix

Do not roll back transaction or change output (if all or nothing is intended)

Additional context

timestamp="2021-03-25T13:28:49.985Z", level="INFO", hostname="xyz", pid="44827", thread="main", class="e.i.f.f.CertificateAuthentificationFilter", message="Successful Authentication", trace="", span="", dnString="C=DE", thumbprint="595133633e4e2292ccc5ba1fc35f0c0fb3b27f0e1f031b7ce533d92e00392dfe" 
timestamp="2021-03-25T13:28:50.102Z", level="INFO", hostname="xyz", pid="44827", thread="main", class="e.i.f.v.DiagnosisKeyBatchValidator", message="Successful validation of diagnosis keys", trace="", span="", thumbprint="595133633e4e2292ccc5ba1fc35f0c0fb3b27f0e1f031b7ce533d92e00392dfe", dnString="C=DE", requestedMediaType="application/protobuf;version=1.0" 
timestamp="2021-03-25T13:28:50.174Z", level="INFO", hostname="xyz", pid="44827", thread="main", class="e.i.f.controller.UploadController", message="verified batch signature", trace="", span="", certNotBefore="2021-03-11T13:28:49Z", requestedMediaType="application/protobuf;version=1.0", numKeys="3", maxKeys="5000", certNotAfter="2022-03-25T13:28:49Z", batchTag="firstBatchTag", dnString="C=DE", thumbprint="595133633e4e2292ccc5ba1fc35f0c0fb3b27f0e1f031b7ce533d92e00392dfe", certThumbprint="1fefe4bfab5527d898741c8a2d6e24d60fac9b5ed0548a97c27b75b198d560ae" 
timestamp="2021-03-25T13:28:50.204Z", level="WARN", hostname="xyz", pid="44827", thread="main", class="o.h.engine.jdbc.spi.SqlExceptionHelper", message="SQL Error: 1062, SQLState: 23000", trace="", span="", certNotBefore="2021-03-11T13:28:49Z", requestedMediaType="application/protobuf;version=1.0", numKeys="3", maxKeys="5000", certNotAfter="2022-03-25T13:28:49Z", batchTag="firstBatchTag", dnString="C=DE", thumbprint="595133633e4e2292ccc5ba1fc35f0c0fb3b27f0e1f031b7ce533d92e00392dfe", certThumbprint="1fefe4bfab5527d898741c8a2d6e24d60fac9b5ed0548a97c27b75b198d560ae" 
timestamp="2021-03-25T13:28:50.205Z", level="ERROR", hostname="xyz", pid="44827", thread="main", class="o.h.engine.jdbc.spi.SqlExceptionHelper", message="Duplicate entry 'd76bc9dd685c8a53ef3dff9369627c59e783e0ab790db0005573eb45e288f61b' for key 'payload_hash'", trace="", span="", certNotBefore="2021-03-11T13:28:49Z", requestedMediaType="application/protobuf;version=1.0", numKeys="3", maxKeys="5000", certNotAfter="2022-03-25T13:28:49Z", batchTag="firstBatchTag", dnString="C=DE", thumbprint="595133633e4e2292ccc5ba1fc35f0c0fb3b27f0e1f031b7ce533d92e00392dfe", certThumbprint="1fefe4bfab5527d898741c8a2d6e24d60fac9b5ed0548a97c27b75b198d560ae" 
timestamp="2021-03-25T13:28:50.213Z", level="ERROR", hostname="xyz", pid="44827", thread="main", class="e.i.f.service.DiagnosisKeyEntityService", message="1: could not execute statement; SQL [n/a]; constraint [payload_hash]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute statement", trace="", span="", certNotBefore="2021-03-11T13:28:49Z", requestedMediaType="application/protobuf;version=1.0", numKeys="3", maxKeys="5000", certNotAfter="2022-03-25T13:28:49Z", batchTag="firstBatchTag", dnString="C=DE", thumbprint="595133633e4e2292ccc5ba1fc35f0c0fb3b27f0e1f031b7ce533d92e00392dfe", certThumbprint="1fefe4bfab5527d898741c8a2d6e24d60fac9b5ed0548a97c27b75b198d560ae" 
timestamp="2021-03-25T13:28:50.218Z", level="ERROR", hostname="xyz", pid="44827", thread="main", class="e.i.f.service.DiagnosisKeyEntityService", message="error inserting keys", trace="", span="", certNotBefore="2021-03-11T13:28:49Z", conflictKeysCount="1", requestedMediaType="application/protobuf;version=1.0", numKeys="3", maxKeys="5000", certNotAfter="2022-03-25T13:28:49Z", insertedKeyCount="2", batchTag="firstBatchTag", dnString="C=DE", thumbprint="595133633e4e2292ccc5ba1fc35f0c0fb3b27f0e1f031b7ce533d92e00392dfe", failedKeysCount="0", certThumbprint="1fefe4bfab5527d898741c8a2d6e24d60fac9b5ed0548a97c27b75b198d560ae" 

PosgreSQL

When connecting an EFGS instance to a PostgreSQL DB, the behaviour is slightly different:

The transaction is also rolled back (no new keys inserted) but reported as:

{"409":[0], "201":[], 500:[1,2,...]}
UBaggeler commented 3 years ago

It looks like this regression was introduced with PR https://github.com/eu-federation-gateway-service/efgs-federation-gateway/pull/145 while trying to fix issue #135. The unit test testRequestUploadKeysExistingHashInProtobufFormat in UploadControllerTest.java#L236 was not corrected to test the new behaviour and therefore did not catch the problem: After the second upload, the backend reports one 409 (conflict of key1 which already exists in the DB) and one 201 (successful insert of key2) but the expected number of keys in the database is asserted to 1:

Assert.assertEquals(1, diagnosisKeyEntityRepository.count());

This should in fact be 2, according to the response(s).

The previous/initial behaviour (empty 201 list in response) was correct in the way that this is what happens because of the transaction rollback. But understandably a rollback of the whole batch is not the desired behaviour.

f11h commented 3 years ago

Hi @UBaggeler ,

the rollback of the whole batch is required to not break integrity. The signature calculated for a batch requires all keys to be present.

That a 201 (in this case) does not really mean created is defined in the documentation for EFGS. Thatswhy i will close this issue.

ubamrein commented 3 years ago

@f11h Could you please provide a link to the mentioned documentation? The OpenAPI definition in the documentation folder specifies that a 201 status code means Database Entries created.. This directly contradicts your statement that 201 "[...] does not really mean created".

Further, looking at the implementation of the cwa-server, we can see here that only keys which had a status code of 500 are added to the retry list and here that only said keys are removed from the originalPayload. Furthermore the documentation of the BatchUploadResponse states:

/**
 * Multi-Status response from Upload. The payload returns three properties (201, 409 and 500) each property contains
 * a list of indexes. The index refers to the key position on the ordered Diagnosis Keys from UploadPayload.
 *  201 - Successfully added               DO NOTHING
 *  409 - Conflict: Key was already added  DO NOTHING
 *  500 - Server Error: Key not processed  RETRY
 */

Also the implementation of the data service here and the documentation of said service here suggests that the response status 201 means "successfully inserted".

Summarized, as far as I understand the implementations (on the client as on the server side), both expect that a status code 201 means that keys were inserted (at least partially on a 207 http status code), which further means that any constraint violation, as e.g. a key which is uploaded twice, leads to the loss of all keys in that batch.

f11h commented 3 years ago

We must differ from Response Status Code and the used status code in the response map.

Response Map Statuses: 201 -> The following keys could be inserted. 409 -> The following keys already exist 500 -> The following keys cannot be inserted because of unknown error.

You can only upload a batch as a whole. Saving just a few keys of a batch makes no sense because the signature would be invalid. (Signature is calculated over the whole batch with all keys).

Yes you are right. The documentation says something different but this was the initial state before several changes which were agreed with the member states. Regarding the implementation in CWA - this is also wrong. We need to tell it the colleagues from cwa-server project.