Tempest coerces the type of records when querying a GSI

szabado-faire commented 5 months ago

Tempest assumes the type of records when querying a GSI, and coerces all records to be that type without checking the entity sort key prefix.

Example

Let's say we have a library hold system that tracks holds on Books and Movies. We might have:

Book Hold
- Hold Token (PK): String
- Book Token: String
- Hold placed at: Instant
- Title, author, other metadata
Movie Hold
- Hold Token (PK): String
- Movie Token: String
- Hold placed at: Instant
- Director, actors, other metadata

When a loaned book/movie gets returned, the library would need to give it to the next person in line. To faciliate that, we'd need a GSI. We could build one for books and one for movies, but the idiomatic dynamo approach would be to share a GSI, and have the following schema:

Book/movie token (PK)
Hold placed at (SK)

That way the system can easily allocate the pending holds by looking at the oldest hold for a given book.

The Problem

Tempest has no type safety here. If you look up book_token_123 in the tempest movie GSI, it'll try to return a book (ignoring the sort key prefix), and can very well succeed if you have enough nullable fields.

This surfaced for me as a bug - I bumped the schema version on my table (including giving it a new sort key prefix), and then my code proceeded to pull the old records out of the table, coerced as new records, and caused lots of mayhem.

Solutions

This feels like it's safely in bug territory but I wanted to consult with you folks on solutions before putting up a fix. In my mind tempest just needs to be checking that the correct prefix exists before using the Codec to parse it.

kyeotic commented 5 months ago

Yes, this is a known issue with GSIs. Using a separate GSI is a workaround, though one that is usually applicable. Generally, given the 50 GSI limit, a separate GSI will be better given that each one will be sparse.

I definitely consider this a bug though.

szabado-faire commented 5 months ago

There's two main ways of resolving this in my mind:

Filter out invalid records. Could conceivably break existing users, but it might also make them more correct
Add a new version of Page that supports multiple types - effectively an ItemSet that has an offset. This would probably involve deprecating the existing Page implementations, but would let tempest solve the same issue when it comes to scan operations as well

Thoughts on either approach?

cashapp / tempest