colopl / laravel-spanner

Laravel database driver for Google Cloud Spanner
Apache License 2.0
96 stars 16 forks source link

Add snapshot methods to query builder/model #231

Open matthewjumpsoffbuildings opened 2 weeks ago

matthewjumpsoffbuildings commented 2 weeks ago

Would be great if the snapshot method could be chained directly on query builder or model method calls. I tried to suss it myself to make a PR but I am not sure the best approach.

Ideally it would be great if you could do something like

// snapshot from a model
User::snapshot()->all();

// snapshot from a builder
$queryBuilder
    ->where('foo', 'bar')
    ->snapshot()
    ->get();

Ideally you should be able to insert the snapshot() method anywhere in the chain, but since snapshots use a closure to execute under the hood Im not sure how it could be made chainable

taka-oyama commented 2 weeks ago

A normal SELECT statement outside of a transaction is already a snapshot read only transaction.

matthewjumpsoffbuildings commented 2 weeks ago

So running a SELECT sets currentSnapshot somewhere on the Connection and makes it execute the query in a snapshot? I am struggling to see where this happens in the code?

I remember there was some confusion between snapshots and timestamp bounds/single use transactions. Timestamp bounds and single use transactions are not the same as snapshots, as discussed here:

matthewjumpsoffbuildings commented 2 weeks ago

Also if it is the case that normal SELECT statements do somehow set currentSnapshot, how do you specify the timestamp bounds for the snapshot? Because the only place I see a snapshot being created is in the ManagesSnapshots trait, and the only time that happens is when Connection::snapshot($timestampBound) is called directly, which is surely not called when you just run a normal SELECT?

Eg this line here https://github.com/colopl/laravel-spanner/blob/afedc2d4bd1dc8885d2faebe2b6d57b50564ff78/src/Concerns/ManagesSnapshots.php#L46

Is the only way to create a snapshot and pass the timestamp bound to it, and that doesnt appear to be called when running a normal SELECT?

matthewjumpsoffbuildings commented 2 weeks ago

I just tested this, if I run something like

DB::connection()->table('totals')->get();

At no point does Connection::executeSnapshotQuery get called, nor does Connection::snapshot. Since Connection::snapshot seems to be the only place that actually runs this->getSpannerDatabase()->snapshot($options);, it is not the case that normal SELECT statements use snapshots. Rather, as before, they appear to be using single use transactions (optionally with timestamp bounds), which as discussed previously, are not the same as snapshots, and do not guarantee lock free reads

matthewjumpsoffbuildings commented 2 weeks ago

Finally, it should actually not be the case that regular SELECT statements use snapshots. Snapshot use should require explicit specification.

This code should not run in a snapshot:

DB::connection()->table('totals')->get();

However something like this should be possible

DB::connection()->table('totals')->snapshot(new ExactStaleness(20))->get();
taka-oyama commented 2 weeks ago

From what I can read from here, all non read write transactions or DMLs are snapshot read-only transactions.

So running this is automatically running the statement as singleUse snapshot read-only transaction. The only reason I added Connection::snapshot(...) is to make it possible to set the staleness for all queries inside the closure for convenience.

matthewjumpsoffbuildings commented 2 weeks ago

I am confused by this. It appears perhaps the documentation is contradicting itself when it says this

Although reads using these timestamp bound modes are not part of a read-write transaction, they can block waiting for concurrent read-write transactions to commit. Bounded staleness reads attempt to pick a timestamp to avoid blocking, but may still block.

So its saying you can have a timestamp bound read that is not part of a read-write transaction (which according to the documentation you linked saying theres only 3 transaction modes, read-write, snapshot, partitioned, must mean that its in a snapshot right, since it cant be partitioned), and yet this read that is in a snapshot can still lock?

When the other documentation says snapshot never lock, as stated here

Snapshot transactions do not take locks. Instead, they work by choosing a Cloud Spanner timestamp, then executing all reads at that timestamp. Since they do not acquire locks, they do not block concurrent read-write transactions.

From my understanding, unless you explicitly call Database::snapshot() to create a Snapshot object, a read you run could still potentially lock, even with timestamp bounds

Perhaps this is a case of the documentation being unclear

taka-oyama commented 2 weeks ago

Bounded staleness reads attempt to pick a timestamp to avoid blocking, but may still block.

The doc really should go more into detail about how it can still block.

From what I have tested in production, switching from strong reads to stale reads resulted in better performance and less blocking overall.

matthewjumpsoffbuildings commented 2 weeks ago

The doc really should go more into detail about how it can still block.

Agreed, it seems confusing and like a direct contradiction currently, but it really does seem to me that the best practice if you want snapshot reads with a complete guarantee of no locking, that you should call Database::snapshot() and run execute() on the returned Snapshot object

From what I have tested in production, switching from strong reads to stale reads resulted in better performance and less blocking overall.

I am sure it does, but since the docs do seem to indicate that stale reads alone is not enough to guarantee no locks, it would make sense to add explicit snapshot usage options

Your closure snapshot() method on Connection is enough for me to have peace of mind, but adding the ability to trigger explicit snapshots on Builder and Model would be a nice to have