Open greg-rychlewski opened 2 years ago
Yeah, I think we need to wait for more general feedback before we can merge it, for sure!
I looked into this more and did a bunch of testing and feel like I know what's going on. It looks like the MyXQL cache and the Ecto cache are becoming out of sync. I'll give you my spiel in 2 parts:
Based on what I'm going to say below, I believe this PR solves the issue. But of course you guys should do whatever you feel is best.
The main culprits are:
ref
for cache lookups in the execute
callback.execute
callback Here is an example sequence that might happen when the same query is called concurrently before it has been cached in Ecto:
MyXQL.prepare_execute
and sets the MyXQL cache with refAMyXQL.prepare_execute
and sets the MyXQL cache with refBMyXQL.prepare_execute
and sets the Ecto cache with refAMyXQL.execute
with refA (picked up from the Ecto cache)MyXQL.prepare_execute
and sets the Ecto cache with refBMyXQL.execute
and doesn't set the Ecto cache with the new reference.I ran the test below on 0.6.0 and printed out the Ecto/MyXQL refs when execute
is ran. You can see the behaviour I described above in the results.
test "why" do
one = TestRepo.insert!(%Post{title: "one"})
two = TestRepo.insert!(%Post{title: "two"})
IO.inspect TestRepo.query("show global status like 'Prepared_stmt_count'")
t = Task.async_stream(1..20, fn _ -> TestRepo.all(Post) end)
Enum.each(t ,fn _ -> nil end)
IO.inspect TestRepo.query("show global status like 'Prepared_stmt_count'")
end
Beginning prepared statement count: 4
"--------------------------"
Execute 1:
"ecto: #Reference<0.4038475985.365953025.207354>"
"myxql: #Reference<0.4038475985.365953030.204170>"
"--------------------------"
Execute 2:
"ecto: #Reference<0.4038475985.365953029.204483>"
"myxql: #Reference<0.4038475985.365953030.204171>"
"--------------------------"
Execute 3:
"ecto: #Reference<0.4038475985.365953030.204156>"
"myxql: #Reference<0.4038475985.365953030.204181>"
"--------------------------"
Execute 4:
"ecto: #Reference<0.4038475985.365953031.207247>"
"myxql: #Reference<0.4038475985.365953030.204192>"
"--------------------------"
Execute 5:
"ecto: #Reference<0.4038475985.365953030.204159>"
"myxql: #Reference<0.4038475985.365953030.204198>"
"--------------------------"
Execute 6:
"ecto: #Reference<0.4038475985.365953030.204163>"
"myxql: #Reference<0.4038475985.365953030.204204>"
"--------------------------"
Execute 7:
"ecto: #Reference<0.4038475985.365953030.204168>"
"myxql: #Reference<0.4038475985.365953029.204520>"
"--------------------------"
Execute 8:
"ecto: #Reference<0.4038475985.365953030.204170>" <---- all prepare executes done, Echo cache stabilizes
"myxql: #Reference<0.4038475985.365953030.204217>"
"--------------------------"
Execute 9:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953030.204224>"
"--------------------------"
Execute 10:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953029.204523>"
"--------------------------"
Execute 11:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953029.204526>"
"--------------------------"
Execute 12:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953030.204238>"
"--------------------------"
Ending prepared statement count: 17
If I run the same test on the changes in this PR, the execute
cache always hits because it's comparing by name and only 1 prepared statement is added.
Postgrex compares by ref and gets around it by updating the Ecto cache when the ref changes (https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L101). But when I made MyXQL update the Ecto cache and ran the test, it seems like if the requests come too fast without a lull you can be perpetually out of sync. This can happen if one of the processes doesn't get to update the MyXQL cache and the Ecto cache before another execution begins:
"ecto: #Reference<0.2753011.4138467329.182193>"
"myxql: #Reference<0.2753011.4138467336.176849>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176838>"
"myxql: #Reference<0.2753011.4138467329.182223>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467335.176871>"
"myxql: #Reference<0.2753011.4138467329.182233>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176842>"
"myxql: #Reference<0.2753011.4138467329.182243>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182210>"
"myxql: #Reference<0.2753011.4138467329.182249>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176845>"
"myxql: #Reference<0.2753011.4138467333.179038>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467333.179028>"
"myxql: #Reference<0.2753011.4138467329.182260>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176849>"
"myxql: #Reference<0.2753011.4138467329.182262>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182223>"
"myxql: #Reference<0.2753011.4138467333.179044>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182233>"
"myxql: #Reference<0.2753011.4138467329.182269>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182243>"
"myxql: #Reference<0.2753011.4138467329.182275>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182249>"
"myxql: #Reference<0.2753011.4138467329.182281>"
If I run this modified test where there is a lull before the last query I get a cache hit:
test "why" do
one = TestRepo.insert!(%Post{title: "one"})
two = TestRepo.insert!(%Post{title: "two"})
t = Task.async_stream(1..20, fn x -> TestRepo.all(Post) end)
Enum.each(t ,fn _ -> nil end)
# Put this Repo.one query so that the Repo.all query does not hit the "last query" cache
TestRepo.one(from p in Post, limit: 1)
TestRepo.all(Post)
end
"ecto: #Reference<0.3998318333.654311427.208239>"
"myxql: #Reference<0.3998318333.654311430.207635>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207599>"
"myxql: #Reference<0.3998318333.654311427.208259>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207607>"
"myxql: #Reference<0.3998318333.654311427.208265>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311428.207665>"
"myxql: #Reference<0.3998318333.654311427.208273>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207613>"
"myxql: #Reference<0.3998318333.654311427.208283>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207622>"
"myxql: #Reference<0.3998318333.654311427.208288>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208249>"
"myxql: #Reference<0.3998318333.654311427.208294>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207635>"
"myxql: #Reference<0.3998318333.654311430.207650>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208259>"
"myxql: #Reference<0.3998318333.654311427.208301>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208265>"
"myxql: #Reference<0.3998318333.654311427.208307>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208273>"
"myxql: #Reference<0.3998318333.654311427.208313>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208283>"
"myxql: #Reference<0.3998318333.654311428.207682>"
"--------------------------"
"cache hit"
If you made it through all this, thank you for listening :)
Thank you @greg-rychlewski, i think the analysis is on point. I am not sure what is the solution though. We can probably keep one of the two commits, but I will ship 0.6.1 first without them to have a more stable base.
We had a problem "ER_MAX_PREPARED_STMT_COUNT_REACHED" with code like this:
query = MyXQL.prepare!(conn, "", ..)
result = MyXQL.execute(conn, query ..)
but when I changed the code to:
result = MyQXL.prepare_execute(conn, "", ...)
The problem went away..
This was on MySQL 5.6.24-72.2-log
If you guys still want to keep the cache changes from the 2 reverted commits, this branch could be tested by the affected people. Though if you'd rather just get rid of them I completely understand. The first 2 commits here are the revert of the reverts and the last is the fix.
Since the test suite passes and I can't replicate the issues locally, I really feel like it's a multi-node setup issue. To summarize from https://github.com/elixir-ecto/myxql/issues/147:In multi-node setups prepared statements with the same name can be saved with different refs. If it was prepared on one node then executed on another it will be re-prepared and the next time it's executed on the original node, the refs won't match and it will be re-prepared again.Before the cache change commit, executions would check for the name and not the ref. So each node would only be able to re-prepare an existing statement once.Though there are still some complications with multi-node set-ups for re-preparing statements with the same name but a different query. But that happens with or without the reverted commits.Edit: actually thinking more about the multi-node setup. I think it's important for cache names too be able to be re-prepared. Say you have 2 nodes both with the same prepared query cached on them. Then this sequence happens:~~1. Close the prepared statement on node 1
So always closing the old statements on re-prepare circumvents that issue.^ don't believe this is correct anymore. i gave a detailed comment below of what i believe the issue is, with tests