Kinto / kinto

A generic JSON document store with sharing and synchronisation capabilities.
http://docs.kinto-storage.org/
Other
4.33k stars 421 forks source link

TOCTOU issues at resource level #1407

Open glasserc opened 6 years ago

glasserc commented 6 years ago

Per https://github.com/Kinto/kinto/pull/1406#issuecomment-347924141, there are TOCTOU issues in the resource level methods that wrap the storage backend. Most of these methods have code that looks something like:

    def delete(self):
        # ... 
        record = self._get_record_or_404(self.record_id)
        self._raise_412_if_modified(record)
        # ...

        deleted = self.model.delete_record(record, last_modified=last_modified)

The problem here is that another transaction could, in the time between the get and the delete, update the record in ways that would cause a race condition. It could update the last_modified of the time, thereby causing the 412 check to be invalidated; it could delete the record, causing the _or_404 check to be invalidated.

The affected methods are collection_post, put, delete, and patch.

I don't see any obvious solution to this problem, but I see a bunch of relatively complicated ones:

Natim commented 6 years ago

It's not an issue to have backend breaking changes, we already had a number of them. Also we already have features that needs transactions and work in a degraded mode for memory and redis backends (that's why we moved redis out of the scope of kinto because we didn't had time to add transaction support for kinto)

@ronhanson is having a usecase that might need this exact feature. findAndUpdate, maybe he can have insight about which solution he would take.

Natim commented 6 years ago

pushing 412s requires some work and some fancy SQL footwork.

Can we add a WHERE last_modified = 'previous_last_modified' clause for the update or delete part and raise a 412 if no records where updated/deleted?

glasserc commented 6 years ago

A PUT can be a create or an update. There's no such thing as INSERT ... WHERE last_modified = 'previous_last_modified'.

Natim commented 6 years ago

I thought we were trying to do the insert and in case of conflict handling the update?

glasserc commented 6 years ago

I think that might be the right approach. Be sure to handle the case of If-Match, which has to fail if there is no record with which to conflict.

peterbe commented 6 years ago

Pardon my kinto-juniorness but...

Feels like this issue is crossing paths with https://github.com/Kinto/kinto/pull/1598 which is about doing inserts.

It feels like the bestest way would be to use row-level locking. And for inserts, rely on idx_records_parent_id_collection_id_last_modified.

The TOCTOU problem happens because we "disconnect" between the client and the database whilst we figure out whether to proceed with the delete/update or whether to raise an error which results in a 419(?) HTTP error.

By doing something like:

SELECT last_modified FROM records
WHERE id=:object_id and parent_id=:parent_id and collection_id=:collection_id
FOR UPDATE NOWAIT

...first, we can be totally confident how to proceed. If the record existed the row-level locking will protect us from a single process getting access to the UPDATE/DELETE to come.

If it didn't exist and it's an insert/upsert, you can't lock the whole table or anything like that but you are also blocked from doing two inserts with the same id|parent_id|collection_id combo. E.g.

INSERT INTO records (id, parent_id, collection_id, data, last_modified) VALUES
(:object_id, :parent_id, :collection_id, (:data)::JSONB, :last_modified)

and, on the Python level you watch out for an IntegrityError. E.g.

try:
    cursor.execute('INSERT INTO ...')
except IntegrityError as exception:
    if 'records_pkey' in exception.args[0]:
        raise TooLateToInsertError('419 for you!')
    raise

Perhaps adding proper Postgresql locking is already something we're planning or perhaps there are circumstances I don't (yet) really understand.

On the Storage interface, it appears to be create, update and delete only. Right? So we can write one implementation of the upsert-with-locking for each. I'd be happy to volunteer where I can.

peterbe commented 6 years ago

By the way, the delete case is solvable without a lock using @Natim 's idea of either using WHERE last_modified = 'previous_last_modified' or something like this:

cur.execute("""
    DELETE FROM records WHERE
    id=:object_id and parent_id=:object_id and  collection_id=:collection_id
    and last_modified=from_epoch(:last_modified)
    RETURNING id
""", {...})
fetched = cur.fetchone()
if not fetched:
    raise TooLateToDeleteError('419 for you!')
peterbe commented 6 years ago

I have prototypes for insert, upsert, update and delete all as python scripts that are pure psycopg2 based. I use a threadpool and some time.sleep() to trigger the race conditions and in all cases I can "prove" the prototypes. Is this interesting?

In all my prototypes I have not yet worried myself with the complexity of tombstones but I'm sure we can factor it in.

The only thing that is a bit stinky is that the interface between Python and Postgres is that you have to rely on the driver to wrap Postgres errors into exception which you then capture and analyze. E.g. this.

glasserc commented 6 years ago

I would be curious to see your scripts.

peterbe commented 6 years ago

I'm trying to collect my thoughts. https://github.com/Kinto/kinto/issues/1525 is all about the create() method. This issue has escalated into a discussion about leveraging proper row-level locks.

Hopefully we can break this up. This issue can be about using row-level locks to do the logical delete. And that other issue can be about using Integrity checks instead of the overly complicated upsert.

So, my prototype for row-level locking is really trivial in comparison to "real kinto" in that, in kinto, we use SQLAlchemy instead of handcrafted SQL strings. For example, the prototype assumes that you always include, as parameters, the last_modified. Perhaps it needs to be smarter and break that up for with- and without a last_modified.

If you think the row-level prototype is sane in theory, I can start attempting a patch.