Performance issues / possible memory leak

rupurt commented 3 years ago

Howdy :wave:

I'm in the process of converting a market making trading system that creates many orders from ETS to Ecto. I plan on supporting any database that Ecto currently has support for, but I've started with SQLite because I want to be able to distribute it without any external dependencies.

SQLite is probably not the best choice for this kind of system due to the high number of writes. But I've pushed forward with the philosophy that if I can make it work somewhat performant in SQLite it should be great with other DB's.

Everything has gone pretty well so far, however now that I'm running it in production for long periods of time I'm noticing that there is severe performance degradation over time that causes a total lock in the SQLite DB. I've attached a graph below of my telemetry output. It shows:

Linear memory growth
Periodic spikes down in memory (I'm assuming GC or freed memory)
Eventually a total lock where the metrics are no longer getting updated

tai-vm-sqlite-locked

I'm also attaching a graph of query times with the hope that they're helpful

tai-sqlite-orders-query-times-locked

The following SQL statements are executed regularly as the hot path

06:12:03.214 [debug] QUERY OK source="orders" db=1.1ms idle=181.7ms
SELECT o0.client_id, o0.close, o0.credential, o0.cumulative_qty, o0.last_received_at, o0.last_venue_timestamp, o0.leaves_qty, o0.post_only, o0.price, o0.product_symbol, o0.product_type, o0.qty, o0.side, o0.status, o0.time_in_force, o0.type, o0.venue, o0.venue_order_id, o0.venue_product_symbol, o0.inserted_at, o0.updated_at FROM orders AS o0 WHERE (o0.client_id = ?) [<<14, 229, 20, 50, 21, 17, 75, 71, 159, 117, 143, 85, 153, 162, 63, 28>>]

06:12:03.217 [debug] QUERY OK db=2.2ms queue=0.1ms idle=111.7ms
begin []

06:12:03.219 [debug] QUERY OK source="orders" db=1.9ms
UPDATE orders AS o0 SET status = ?, leaves_qty = ?, last_received_at = ?, last_venue_timestamp = ? WHERE ((o0.client_id = ?) AND o0.status IN (?,?,?,?)) RETURNING client_id, close, credential, cumulative_qty, last_received_at, last_venue_timestamp, leaves_qty, post_only, price, product_symbol, product_type, qty, side, status, time_in_force, type, venue, venue_order_id, venue_product_symbol ["canceled", "0", ~U[2021-06-01 06:12:03Z], ~U[2021-06-01 06:12:02Z], <<14, 229, 20, 50, 21, 17, 75, 71, 159, 117, 143, 85, 153, 162, 63, 28>>, "create_accepted", "open", "pending_cancel", "cancel_accepted"]

06:12:03.222 [debug] QUERY OK db=1.7ms
INSERT INTO order_transitions (order_client_id,transition,id,inserted_at,updated_at) VALUES (?,?,?,?,?) [<<14, 229, 20, 50, 21, 17, 75, 71, 159, 117, 143, 85, 153, 162, 63, 28>>, "\"{\\\"last_received_at\\\":\\\"2021-06-01T06:12:03.212374Z\\\",\\\"last_venue_timestamp\\\":\\\"2021-06-01T06:12:02.986417Z\\\",\\\"__type__\\\":\\\"cancel\\\"}\"", <<29, 70, 168, 213, 105, 47, 77, 115, 169, 224, 255, 68, 17, 101, 121, 245>>, ~U[2021-06-01 06:12:03.219952Z], ~U[2021-06-01 06:12:03.219952Z]]

06:12:03.222 [debug] QUERY OK db=0.7ms
commit []

They're issued from this Elixir module https://github.com/fremantle-industries/tai/blob/orders-ecto-repo/apps/tai/lib/tai/new_orders/services/apply_order_transition.ex

  defp update_order_and_save_transition(client_id, %transition_mod{} = transition, order_transition_changeset) do
    from_status = transition_mod.from()
    attrs = transition_mod.attrs(transition)
    update_order_query = build_update_order_query(client_id, from_status, attrs)

    # The previous order needs to be selected outside of the transaction to
    # prevent a possible deadlock.
    case OrderRepo.get(Order, client_id) do
      %Order{} = previous_order_before_update ->
        # Check if the existing order has a status that supports this
        # transition in memory and only rely on the transaction rollback
        # as a fallback. There is a performance penalty to rolling back
        # a transaction.
        if Enum.member?(from_status, previous_order_before_update.status) do
          fn ->
            case OrderRepo.update_all(update_order_query, []) do
              {0, []} ->
                status_was = previous_order_before_update.status
                reason = {:invalid_status, status_was}
                OrderRepo.rollback(reason)

              {1, [current_order]} ->
                case OrderRepo.insert(order_transition_changeset) do
                  {:ok, _} -> {previous_order_before_update, current_order}
                  {:error, reason} -> OrderRepo.rollback(reason)
                end

              {:error, reason} ->
                OrderRepo.rollback(reason)
            end
          end
          |> OrderRepo.transaction()
        else
          status_was = previous_order_before_update.status
          reason = {:invalid_status, status_was}
          {:error, reason}
        end

      nil ->
        {:error, :order_not_found}
    end
  end

kevinlang commented 3 years ago

SQLite is probably not the best choice for this kind of system due to the high number of writes.

In my experience and benchmarks, if you don't have a need for multiple nodes or long-running (write) transactions, SQLite3 will generally be faster than Postgres and the like, due to lack of client/server overhead. Most of the bad reputation is due to SQLite3 not having WAL on by default due to backward compat concerns.

Thanks for this data. I have a feeling we are not cleaning up statement objects correctly in the exqlite driver, as I can't think of anything that could be causing a memory leak on the ecto adapter side.

warmwaffles commented 3 years ago

It could be the prepared statements not having their destructors called when the reference count goes to zero. Very odd. I'll take a look later in the day but that's what I have a feeling is happening.

@kevinlang it could also be that we aren't free'ing some binary data for sql statements / results. I haven't noticed any crazy amounts of memory climb in my apps, but if @rupurt is issuing a ton of updates and selects, that could be the culprit.

EDIT:

specifically the make_cell stuff where we take the results from sqlite and turn them into data that elixir / erlang can understand.

kevinlang commented 3 years ago

That's a good point. Most of the anecdotes I read about GC issues in BEAM usually come down to large binaries causing an issue. If I have time I'll try to look into that.

@rupurt can you comment more on the "lock up" issues? I think that may be a separate issue. Is the VM itself locking up (due to memory issues) or just the database? Typically for the latter you should get an error message - what sort are you getting?

The most common cause of lock up is usually due to upgrading a READ transaction to a WRITE transaction. https://sqlite.org/isolation.html

rupurt commented 3 years ago

In my experience and benchmarks, if you don't have a need for multiple nodes or long-running (write) transactions, SQLite3 will generally be faster than Postgres and the like, due to lack of client/server overhead. Most of the bad reputation is due to SQLite3 not having WAL on by default due to backward compat concerns.

Cool. Thanks gents for looking into it so quickly. Interesting to know about the performance. I'm only basing that on what I hear from other folks, so that encourages me to keep pushing forward with the current strategy :)

@rupurt can you comment more on the "lock up" issues? I think that may be a separate issue. Is the VM itself locking up (due to memory issues) or just the database? Typically for the latter you should get an error message - what sort are you getting?

The "lock up" problem isn't a deadlock. I've run into deadlocks in the past but they were problems with my code.

The "lock up" behavior I'm seeing now is not a total VM lock. It's a connection lock. I can remote attach to the instance and issue commands in IEx, but I can't query anything from the SQLite DB. Everything is working fine for ~30 mins or so, memory climbs linearly and then eventually the connection just locks up.

I don't have any logs handy with the error message. But I think it eventually timed out after 15 seconds or so.

FWIW I'm running this on a GCP n1-standard-1 instance

warmwaffles commented 3 years ago

@rupurt do you have a chart of the number of queries executed along with the memory usage? I'm just wanting to get a sense of the volume.

rupurt commented 3 years ago

@warmwaffles I don't. But that does sound handy so let me figure out if I can add one. I'll also post the logs for the timeout error on this next run.

rupurt commented 3 years ago

This is the error I get

338335-22:39:22.294 [error] Exqlite.Connection (#PID<0.2507.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.2515.0> timed out because it queued and checked out the connection for longer than 15000ms
338336-
338337-#PID<0.2515.0> was at location:
338338-
338339-    (stdlib 3.14) gen.erl:208: :gen.do_call/4
338340-    (stdlib 3.14) gen_event.erl:282: :gen_event.rpc/2
338341-    (logger 1.11.3) lib/logger/handler.ex:105: Logger.Handler.log/2
338342-    (kernel 7.2.1) logger_backend.erl:51: :logger_backend.call_handlers/3
338343-    (ecto_sql 3.6.1) lib/ecto/adapters/sql.ex:926: Ecto.Adapters.SQL.log/4
338344-    (db_connection 2.4.0) lib/db_connection.ex:1460: DBConnection.log/5
338345-    (ecto_sqlite3 0.5.5) lib/ecto/adapters/sqlite3/connection.ex:91: Ecto.Adapters.SQLite3.Connection.query/4
338346-    (ecto_sql 3.6.1) lib/ecto/adapters/sql.ex:786: Ecto.Adapters.SQL.struct/10

warmwaffles commented 3 years ago

I've been walking through the nif code and can't spot any egregious leaks.

I wonder if the statements are never being free'd because something is still holding a reference to it.

warmwaffles commented 3 years ago

@rupurt would you mind checking out https://github.com/elixir-sqlite/exqlite/pull/155 and running that with your stack to see if the issue is still present?

I don't have a reliable way to reproduce the issue.

rupurt commented 3 years ago

Shoot, sorry @warmwaffles. I ended up resolving this on my side. It was a bug in my code... :/

warmwaffles commented 3 years ago

Haha, well @rupurt it actually sent me down a rabbit hole and I think this is more friendly now. Give the latest version a shot.

lauragrechenko commented 2 years ago

@warmwaffles @kevinlang Hi, I'm a little bit out of ideas.. Maybe you could help In our project we have a process which reads 1000 entries from Sqlite DB, validates them and then reads next 1000. The DB could be really big, so the process can do it for a long period of time.

Now K8s shows that our pod uses all memory 5 GBs. Attached a screenshot (pink line) But the erlang node (recon_alloc) says it allocated only 1.2Gbs (usage 0.8).

I’m thinking if it’s possible that we have memory leak in Sqlite?

We’re using: 

{:ecto_sqlite3, "~> 0.7.2"},
{:exqlite, "~> 0.9.3",
 override: true,
 system_env: [{"CFLAGS", "-c -O2 -DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT=104857600"}]},

Repo Config:

  database: "audit.db",
  busy_timeout: 30_000,
  timeout: 60_000,
  after_connect_timeout: 60_000,
  cache_size: -2000,
  queue_target: 60_000,
  queue_interval: 500,
  log: false

When the Pod is out of memory (it's a limit 5 GBs) then I see in the log errors:

"metadata": "module=DBConnection.Connection function=disconnect/2 line=148 pid=<0.3754.0> ", "message": "Exqlite.Connection (#PID<0.3754.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.8217.170> timed out because it queued and checked out the connection for longer than 60000ms
#PID<0.8217.170> was at location:

    (exqlite 0.9.3) lib/exqlite/sqlite3.ex:89: Exqlite.Sqlite3.multi_step/3
    (exqlite 0.9.3) lib/exqlite/sqlite3.ex:136: Exqlite.Sqlite3.fetch_all/4
    (exqlite 0.9.3) lib/exqlite/connection.ex:566: Exqlite.Connection.get_rows/2
    (exqlite 0.9.3) lib/exqlite/connection.ex:512: Exqlite.Connection.execute/4
    (db_connection 2.4.1) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.4.1) lib/db_connection.ex:1333: DBConnection.run_execute/5
    (db_connection 2.4.1) lib/db_connection.ex:1428: DBConnection.run/6

Thank you for your time.

Code how it reads:

  def select_page(query, page, limit) do
    page =
      query
      |> paginate(page, limit)
      |> all()

    {:ok, page}
  end

  defp paginate(query, page, limit) do
    Ecto.Query.from(query,
      limit: ^limit,
      offset: ^((page - 1) * limit)
    )
  end

warmwaffles commented 2 years ago

I don't know if kevin is going to be helping anymore, he's been pretty silent for the last few months. I am however going to take a look at this again.

Side note, this is news to me.

system_env: [{"CFLAGS", "-c -O2 -DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT=104857600"}]

Does that actually set the environment values during compilation?

lauragrechenko commented 2 years ago

Thanks a lot. Could I help some how to debug it?

About DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT. It seems so. We change it in mix.exs and in dockerfile like this: ENV CFLAGS="-c -O2 -DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT=104857600" and the limit is 100 MB.

warmwaffles commented 2 years ago

@rupurt do you recall how you resolved the memory leak issue on your side of things?

@lauragrechenko Are you doing any other large-ish queries to sqlite3? I wonder if this is tied to the timeout issues that @LostKobrakai was experiencing a few weeks back.

warmwaffles commented 2 years ago

Also, would you be able to capture what is happening at this point?

20220218-092059

lauragrechenko commented 2 years ago

Each node has "audit.db". On all nodes writing is happening: ~100 entries every 2 seconds. On this node, where you pointed (on the picture above), I started a process which reads and validates 1000 entries. It started growing immediately. So on "pink" node writing is happening every 2 seconds and reading 1000 entries every few seconds.

LostKobrakai commented 2 years ago

Just to clear up some potential confusion here. NIF memory usage is not tracked by the erlang vm unless specifically hooked into it. So it's expected to not see erlang report the memory usage of sqlite. This does however not mean that there's no memory leak.

I've seem some stange behaviour this week around a large delete query causing reboots in our nerves system though, which I'm not yet sure about the root cause.

warmwaffles commented 2 years ago

Are those 100 entries being written in batches or sequentially? If they are happening sequentially, it may not be full filling all of the writes in time and slowly backing up.

Do you know how big the database is on each node?

lauragrechenko commented 2 years ago

For writing we use Ecto.Repo.insert_all/3. The limit for writing 1 batch is 1000 entries. But in our normal case we have only ~100 entries for writing in 1 batch.

But I can stop all writing, it's only reading by this 1 process. And it's the same - memory grows. I turned off validating data, just read 1000 entries, sleep for 1 sec and read next part.

Now the DB is ~5.5 GB

warmwaffles commented 2 years ago

I need to look into using a custom memory allocator for sqlite to use erlang's machinery so we can get some better telemetry on it. And add telemetry to exqlite in general.

lauragrechenko commented 2 years ago

It's reading 1_000_000 entries, by 1000, pause 1 s

Enum.each(1..1000, fn page -> Process.sleep(1000);  Repo.select_page(Audit.Schema, page, 1000) end)

Erlang node again shows "allocated memory" ~525MB. And k8s: 1100MB And it doesn't go down

rupurt commented 2 years ago

@warmwaffles the memory leak from my initial post was 110% my own crappy application code :)

I was cancelling and creating many orders. The orders that were cancelled were in their final resting state. So I just cleared out the callbacks for those orders as they should never get executed.

laszlohegedus commented 2 years ago

I don't know if kevin is going to be helping anymore, he's been pretty silent for the last few months. I am however going to take a look at this again.

Side note, this is news to me.
system_env: [{"CFLAGS", "-c -O2 -DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT=104857600"}]
Does that actually set the environment values during compilation?

Yes, it does. :) It's kind of hack-ish, but couldn't do it any different way.

warmwaffles commented 2 years ago

Yes, it does. :) It's kind of hack-ish, but couldn't do it any different way.

This is a decent option that I'll need to add to the documentation so others can utilize it if they want to enable / disable features when compiling sqlite.

warmwaffles commented 2 years ago

@lauragrechenko I haven't forgotten about this, I just did not have time this weekend to dig into it more. I'm going to build a benchmarking / load testing suite soon to try and pin point the issue.

lauragrechenko commented 2 years ago

@warmwaffles I was about to write you. Sorry for wasting your time. I think we can close the issue. I'm testing it now but I think we found an issue in usage Sqlite with a limit, offset. The issue is "By doing a query with a offset of 95000, all previous 95000 records are processed".

Maybe it'll help someone: https://stackoverflow.com/questions/12266025/sqlite-query-optimization-using-limit-and-offset

So now instead of Ecto.Query.from(query, limit: ^limit, offset: ^((page - 1) * limit)) I tried

Ecto.Query.from(q in query, where: q.id >= ^from and q.id < ^till)

and it seems working just fine :)

warmwaffles commented 2 years ago

Heh, I still need to build a good benchmark and test suite along with adding better telemetry for memory usage and what not.

warmwaffles commented 2 years ago

@lauragrechenko I've opened a PR here to utilize a custom memory allocator for sqlite. If you could give it a run in your environment for tests or something, feedback would be extremely welcome.

lauragrechenko commented 2 years ago

@warmwaffles Thanks. I'll try it today. we still can see memory is growing, not so significantly as on the screenshot above but still growing on all nodes.. and Erlang node still says it doesn't allocate so much memory But we have others lib with Nif, so not sure yet what's going on

warmwaffles commented 2 years ago

@lauragrechenko released v0.10.0 that has the custom allocator in place now and erlang vm memory usage will now include sqlite usage.

lauragrechenko commented 2 years ago

@warmwaffles Thanks a lot. Yeah, we couldn't start it but today I took your PR and a fix from @laszlohegedus. I'm already running tests, it's been only for 2 hours, so in a few more hours I can say about current "memory usage".

lauragrechenko commented 2 years ago

@warmwaffles Hi, the memory looks just fine now. Thanks a lot

elixir-sqlite / exqlite

Performance issues / possible memory leak #153