Identical prepared statements per connection

byronpc commented 4 years ago

Ok. I wrote an issue last week with regards to identical prepared statements keep growing when you send an erroneous statement via query.

I'm now replicating an issue wherein non erroneous queries can also cause multiple identical prepared statements in the db.

In order to replicate the issue. I start up the service with just a pool of 2 connections.

In order to confirm this

mysql> show processlist;
+-----+-------+-----------------+----------+---------+------+----------+------------------+
| Id  | User  | Host            | db       | Command | Time | State    | Info             |
+-----+-------+-----------------+----------+---------+------+----------+------------------+
| 617 | root  | localhost:65502 | xxxxxxxx | Query   |    0 | starting | show processlist |
| 647 | byron | localhost:50815 | xxxxxxxx | Sleep   |    0 |          | NULL             |
| 648 | byron | localhost:50816 | xxxxxxxx | Sleep   |    1 |          | NULL             |
+-----+-------+-----------------+----------+---------+------+----------+------------------+

Now I ran 1000 parallel commands to Repo by

1..1000 |> Enum.each(&Task.async(fn -> Repo.get_by(Education, [id: &1]) end))

I confirmed from the processlist that there are still 2 connections to the pool (to make sure there is no misconfiguration) after the queries have been sent

But when I checked the prepared_statement_instances, 1000 instances has been created

mysql> select SQL_TEXT, COUNT(1) from performance_schema.prepared_statements_instances GROUP BY SQL_TEXT\G
*************************** 1. row ***************************
SQL_TEXT: SELECT CAST(s0.`version` AS unsigned) FROM `schema_migrations` AS s0 FOR UPDATE
COUNT(1): 1
*************************** 2. row ***************************
SQL_TEXT: SELECT e0.`id`, e0.`start_date`, e0.`end_date`, e0.`degree_level`, e0.`degree_name`, e0.`education_field`, e0.
`average_grade`, e0.`description`, e0.`gpa`, e0.`skills`, e0.`person_id`, e0.`organization_id`, e0.`requested_organizati
on_id` FROM `education` AS e0 WHERE (e0.`id` = ?)
COUNT(1): 1000
2 rows in set (0.01 sec)

To me, there seems to be a race condition somewhere in the driver which causes it to prepare the statement again even if it has already been prepared prior which is weird because on poolboy, we checkout the connection and checkin it afterwards. So pretty sure if we have already prepared it before on one connection, it shouldn't need to be reprepared on the succeeding checkouts.

This bug doesn't happen when you put an interval in between the queries which is understandable.

But on a live setup wherein it's very possible to have multiple requests running at the same time which can cause this type of leakage

josevalim commented 4 years ago

We should have one query per connection, so 1000 is way above our expectations. The cache is also per connection, so concurrency within Elixir should not be an issue, unless the issue comes from how we concurrently prepare the queries against the database.

wojtekmach commented 4 years ago

Just double-checking, are you running on latest version, myxql v0.3.4?

byronpc commented 4 years ago

~~even more latest than 0.3.4. since you pushed a hot fix last week with regards to the leakage~~

byronpc commented 4 years ago

oops sorry. rather i'm using a custom commit you did last week. so it's essentially the same

josevalim commented 4 years ago

Oh, it is an Ecto query cache issue. We are racing Ecto's cache.

byronpc commented 4 years ago

This is probably an edge case since it requires that the statement to be sent is not prepared in the first place and have multiple processes execute the same query to the process. If the statement has already been prepared prior then it's good.

elixir-ecto / myxql

Identical prepared statements per connection #107