EC-SEAL / interface-specs

Open API specifications
1 stars 4 forks source link

Session dataStore management API #3

Open faragom opened 4 years ago

faragom commented 4 years ago

This is an issue for the Session Manager, but there is not a repository on github for it yet, so I open it here @endimion @ross-little

Issue:

The current state of the design of the data store in session, has some weak spots:

Notice we are talking about the structure to be kept in session. It is completely independent from the structure that will be used to persist the data.

Proposal:

  1. Use a dictionary: The dataStore session variable will be a dictionary, being:

    • key the unique ID of the dataSet
    • value the object to store
  2. Use a wrapper object around dataSet object: To allow handling the multiple kinds of objects that can be stored on the dataStore (linkRequest, dataSet), create the storeEntry object. It will allow to homogenise the storage, search and classification of the stored objects.

    • dataStore will keep storeEntry objects
    • A storeEntryobject will keep a single dataSet or linkRequest object.

    So, instead of having this in dataStore:

    {
      "019293f04893b0298392a0484" : {"this is": "a dataSet object"},
      "019293m04893c0298392d0494" : {"this is": "a linkRequest object"}
    }

    We have this:

    {
      "019293f04893b0298392a0484" : {
        "id": "019293f04893b0298392a0484",
        "type": "dataSet",
        "data": {"this is": " a dataSet object"}
      },
      "019293m04893c0298392d0494" : {
        "id": "019293m04893c0298392d0494",
        "type": "linkRequest",
        "data":  {"this is": " a linkRequest object"}
      }
    }

    Being:

    • id: a unique and persistent identifier for the object, for a given source, issuer and subject. Each source will set this field on write at its own will (for example a hash of the name of the data source, url of the issuer and the subject identifier), and will be used to tell the updates from the additions.
    • type: will specify the class name of the stored object. For the moment, dataSet or linkRequest.
    • data: The object to store. This can be an unbound object, but we will store dataSet or linkRequest objects only.
  3. Implement dataStore management API: on the Session Manager, as it is the most adequate and obvious location for an interface intended to manipulate a special object in session, the same way as now the the SM writes and gets a variable, but to manage a single special session variable.

    Proposed API operations:

    • start(sessionId). Will create an empty dataStore in session, overwriting any pre-existing one.

      Set the dataStore session variable to an empty dict. If there was anything there, overwrite.

    • add(sessionId, id, type, object). If id already exists, will update.

      • Read dataStore variable. If not a dict, return error
      • You get an object (a json), a type (dataset, linkRequest, etc.) and an Id (any unique string the writer decides).
      • Search the array to see if there is an entry with the same ID (or if using a dict, use the ID as the key and check if that key is already set)
      • If existing, delete and add/overwrite its contents with a json like this:
        {
            "id": "the ID",
            "type": "the type",
            "data": {"the": "object"}
        }
      • Write the dataStore json in the dataStore session variable
    • delete(sessionId, id). Support only single deletions by id

      • Read dataStore variable. If not a dict, return error
      • You get an id
      • Search the id on the dict and delete/unset.
      • Write the dataStore json in the dataStore session variable
    • get(sessionId, id). Return a single object, if exists (the storeEntry.)

      • Read dataStore variable. If not a dict, return error
      • You get an id
      • Search the id on the dict and get the json there
      • Return the json.
    • search(sessionId, type). Return all objects of a type (or all objects, if no type specified)

      • Read dataStore variable. If not a dict, return error
      • You get a type
      • Search through all objects on the array of dict values and get ALL the json there with a matching type
      • Return an array of the retrieved json objects
  4. Concurrence control:

    • We can write a timestamp (a millisecond timestamp, not a unix second timestamp) on the dataStore structure.
    • You read the dataStore, do the operation, and before you write it, read it again and compare the timestamps.
    • If they match, no concurrent write happened and it is safe to write (updating the timestamp, of course).
    • If timestamp has changed, wait a random time
    • Read the dataStore again and repeat the operation over the newly read dataStore (and try again to write).
    • To prevent starvation of the process, we limit the number of retries, and return a write error if so (and let the caller ms to retry at will).

    As said above, the concurrence problem happens for a single user session, it is very unlikely that the user will manage multiple writes on the dataStore, so this solution should be enough, and requires few extra code (and not touching substantially existing one).

ross-little commented 4 years ago

Looks good! Simplifies the updating to particular dataSet objects.

Some questions:

  1. So every ms will need to update for the new operations and implement the concurrence control check. That said, the client should also be designed to prevent operations in parallel right? Or is this scenario possible with the linking reconciliation module?

  2. I understand that two clients could not have access to the same session for the same user. However, what if a user opened up a second client and started making changes to the dataStore, after making changes to the first opened session. Should there be some way to prevent the client opening a dataStore in a new session while it is concurrently opened by another client? Or we just let the client do these silly things if he/she wants?

faragom commented 4 years ago
So every ms will need to update for the new operations and implement the concurrence control check. 

Every ms should already have a retry if a back-channel returns an error, but yes.

That said, the client should also be designed to prevent operations in parallel right? Or is this scenario possible with the linking reconciliation module?

Operations in parallel should be allowed (your exampel is the main reason, yes). But, as said above, it won't be a very frequent case to have concurrent writes

I understand that two clients could not have access to the same session for the same user. 

No. Each client creates a new session on start.

However, what if a user opened up a second client and started making changes to the dataStore, 

The datastore in session would be a different one, and at the moment of writing on the persistent storage, last write overwrites. It's way out of our scope to control this kind of concurrence

after making changes to the first opened session. Should there be some way to prevent the client opening a dataStore in a new session while it is concurrently opened by another client? Or we just let the client do these silly things if he/she wants?

We should let the user be responsible of this. Also, if we implement some kind of lock, if the user accesses it from two clients, there are many chances of leaving it in an inconsistent state, and thus locking it permanently (requiring some hack to remove the lock). So, not worth the effort

miryamvillegas commented 4 years ago

Hi @faragom ,

I've been testing the eidas-idp and the edugain-idp, getting two datastores slightly different:

[{"id":"eIDASeidas.gr/gr/ermis-11076669","type":"dataSet", "data":"{\"id\":\"743cedd7-8e52-4354-a262-8ed9cf586edd\", \"encryptedData\":null,\"signature\":null,\"signatureAlgorithm\":null,\"encryptionAlgorithm\":null, \"clearData\": [{\"id\":\"514e358c-551c-41eb-a31c-2ddfd7a6583\" ,\"type\":\"eIDAS\", \"categories\":null,\"issuerId\":\"eIDAS\",\"subjectId\":null,\"loa\":null,\"issued\":\"Mon, 31 Aug 2020 08:27:37 GMT\",\"expiration\":null, \"attributes\": [{\"name\":\"http://eidas.europa.eu/attributes/naturalperson/CurrentFamilyName\",\"friendlyName\":\"FamilyName\",\"encoding\":\"UTF-8\",\"language\":\"N/A\",\"values\":[\"ΠΕΤΡΟΥ, PETROU\"]},{\"name\":\"http://eidas.europa.eu/attributes/naturalperson/CurrentGivenName\",\"friendlyName\":\"GivenName\",\"encoding\":\"UTF-8\",\"language\":\"N/A\",\"values\":[\"ΑΝΔΡΕΑΣ, ANDREAS ΠΕΤΡΟΥ, PETROU\"]},{\"name\":\"http://eidas.europa.eu/attributes/naturalperson/DateOfBirth\",\"friendlyName\":\"DateOfBirth\",\"encoding\":\"UTF-8\",\"language\":\"N/A\",\"values\":[\"1980-01-01\"]},{\"name\":\"http://eidas.europa.eu/attributes/naturalperson/PersonIdentifier\",\"friendlyName\":\"PersonIdentifier\",\"encoding\":\"UTF-8\",\"language\":\"N/A\",\"values\":[\"eidas.gr/gr/ermis-11076669\"]},{\"name\":\"http://eidas.europa.eu/LoA\",\"friendlyName\":\"LevelOfAssurance\",\"encoding\":\"UTF-8\",\"language\":\"N/A\",\"values\":[null]}],\"properties\":null}]}"}]

[{"id":"bcdda9d1-1d85-4991-b132-c8efb99cb8d3","type":"dataSet",
"data":"{\"id\":\"DATASET1f30cec5-2217-481e-9bf9-37632641b84d\",\"type\":\"eduGAIN\",
\"categories\":null,\"issuerId\":\"This is the user ID.\",\"subjectId\":null,\"loa\":null,\"issued\":\"Wed, 2 Sep 2020 11:29:33 GMT\",
\"expiration\":null,\"attributes\":
[{\"name\":\"urn:oid:1.3.6.1.4.1.5923.1.1.1.10\",\"friendlyName\":\"eduPersonTargetedID\",\"encoding\":null,\"language\":null,\"values\":[null]},
{\"name\":\"urn:oid:2.5.4.42\",\"friendlyName\":\"givenName\",\"encoding\":null,\"language\":null,\"values\":[\"SEAL\"]},
{\"name\":\"urn:oid:0.9.2342.19200300.100.1.3\",\"friendlyName\":\"mail\",\"encoding\":null,\"language\":null,\"values\":[\"seal-test0@example.com\"]},
{\"name\":\"urn:oid:2.5.4.3\",\"friendlyName\":\"cn\",\"encoding\":null,\"language\":null,\"values\":[\"Tester0 SEAL\"]},
{\"name\":\"urn:oid:2.5.4.4\",\"friendlyName\":\"sn\",\"encoding\":null,\"language\":null,\"values\":[\"Tester0\"]},
{\"name\":\"urn:oid:2.16.840.1.113730.3.1.241\",\"friendlyName\":\"displayName\",\"encoding\":null,\"language\":null,\"values\":[\"SEAL Tester0\"]},
{\"name\":\"urn:oid:1.3.6.1.4.1.5923.1.1.1.6\",\"friendlyName\":\"eduPersonPrincipalName\",\"encoding\":null,\"language\":null,\"values\":[\"128052@gn-vho.grnet.gr\"]},
{\"name\":\"urn:oid:1.3.6.1.4.1.5923.1.1.1.7\",\"friendlyName\":\"eduPersonEntitlement\",\"encoding\":null,\"language\":null,\"values\":[\"urn:mace:grnet.gr:seal:test\"]}],\"properties\":null}"}]

Could you please confirm if the data field shoud contain a dataStoretype or a dataSet type?

Thanks!

FYI: @endimion , @BPereira99 , @rcc-atos

miryamvillegas commented 4 years ago

Another question is about the idvalue for identifying each entry to the datastore:

Are we going to standarize them in some way? Note the eidas entity's id is eIDASeidas.gr/gr/ermis-11076669 and the edugain entity's id is an UUID...

faragom commented 4 years ago

Hi,

the data field should contain an object of the class defined on the type field, in our case they will be dataSet or linkRequest objects, but never dataStore

Regarding the id, there is no need to standardise its format, but what we need to standardise is its scope: The id MUST be unique for each identity module + identity provider + subject, but it also MUST be persistent in time for a given identity module + identity provider + subject. This means that:

So, the above would suffice for eIDAS. For other sources, like EduGAIN, the structure should be different. Maybe include the eduPersonTargetedIdentifier, which is unique for a subject + idp, or the schacHomeOrganization and schacPersonalUniqueCode. It is a decision of the developer of the module.

To limit the charset to be expected, control the length of the id and to veil any personal info, maybe we should calculate this ID as the SHA1 of the above said strings

Regards.

El jue., 3 sept. 2020 a las 9:43, mvj66 (notifications@github.com) escribió:

Another question is about the id value for identifying each entry to the datastore:

Are we going to standarize them in some way? Note the eidas entity's id is eIDASeidas.gr/gr/ermis-11076669 and the edugain entity's id is and an UUID...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EC-SEAL/interface-specs/issues/3#issuecomment-686316352, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG4HJ6ZU7BCG4R6YIEQPBF3SD5CKJANCNFSM4NPYYM7A .

-- Francisco José Aragó Monzonís mitsurugisan@gmail.com