When connection times out a SIGSEGV is encountered

warmwaffles commented 3 years ago

For a large table of 66,000 records

Company
|> select([c], c)
|> Repo.all()

Encounter

14:25:34.215 [error] Exqlite.Connection (#PID<0.446.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.536.0> timed out because it queued and checked out the connection for longer than 15000ms

#PID<0.536.0> was at location:

    (exqlite 0.5.1) lib/exqlite/sqlite3.ex:107: Exqlite.Sqlite3.fetch_all/3
    (exqlite 0.5.1) lib/exqlite/connection.ex:486: Exqlite.Connection.get_rows/2
    (exqlite 0.5.1) lib/exqlite/connection.ex:439: Exqlite.Connection.execute/4
    (db_connection 2.3.1) lib/db_connection/holder.ex:316: DBConnection.Holder.holder_apply/4
    (db_connection 2.3.1) lib/db_connection.ex:1272: DBConnection.run_execute/5
    (db_connection 2.3.1) lib/db_connection.ex:1359: DBConnection.run/6
    (db_connection 2.3.1) lib/db_connection.ex:557: DBConnection.parsed_prepare_execute/5
    (db_connection 2.3.1) lib/db_connection.ex:550: DBConnection.prepare_execute/4
    (ecto_sqlite3 0.5.3) lib/ecto/adapters/sqlite3/connection.ex:62: Ecto.Adapters.SQLite3.Connection.prepare_execute/5
    (ecto_sql 3.5.4) lib/ecto/adapters/sql.ex:692: Ecto.Adapters.SQL.execute!/4
    (ecto_sql 3.5.4) lib/ecto/adapters/sql.ex:684: Ecto.Adapters.SQL.execute/5
    (ecto 3.5.8) lib/ecto/repo/queryable.ex:229: Ecto.Repo.Queryable.execute/4
    (ecto 3.5.8) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
    (stdlib 3.14) erl_eval.erl:680: :erl_eval.do_apply/6
    (elixir 1.11.3) src/elixir.erl:280: :elixir.recur_eval/3
    (elixir 1.11.3) src/elixir.erl:265: :elixir.eval_forms/3
    (iex 1.11.3) lib/iex/evaluator.ex:261: IEx.Evaluator.handle_eval/5
    (iex 1.11.3) lib/iex/evaluator.ex:242: IEx.Evaluator.do_eval/3
    (iex 1.11.3) lib/iex/evaluator.ex:220: IEx.Evaluator.eval/3

fish: Job 1, 'iex -S mix' terminated by signal SIGSEGV (Address boundary error)

This may be a reason why esqlite added bulk fetching rows instead of simply stepping. Although bulk fetching would be nice, the issue arises that an error happens midway through the step, and how to communicate that error back is a bit tricky.

warmwaffles commented 2 years ago

We do multi-stepping when fetching. It was simply too slow to fetch individual rows and return back to vm.

warmwaffles commented 2 years ago

https://github.com/elixir-sqlite/exqlite/blob/0008856078f48fc9a56e8a693bb5805fb8e9fa7f/lib/exqlite/sqlite3.ex#L119-L149

which is called from

https://github.com/elixir-sqlite/exqlite/blob/0008856078f48fc9a56e8a693bb5805fb8e9fa7f/lib/exqlite/connection.ex#L565-L573

and consumed by

https://github.com/elixir-sqlite/exqlite/blob/0008856078f48fc9a56e8a693bb5805fb8e9fa7f/lib/exqlite/connection.ex#L509-L514

LostKobrakai commented 2 years ago

@warmwaffles I'm seeing a segfault while the NIF is running multi_step (https://github.com/elixir-sqlite/exqlite/pull/191) when the outer db connection timeouts. So something seems to still be off, even with returning things to elixir in small chunks.

warmwaffles commented 2 years ago

@LostKobrakai were you able to change the chunk size and still get the same error? Just trying to collect as much info as I can.

warmwaffles commented 2 years ago

Also, how big do you think on average each one of those 66,000 records are?

LostKobrakai commented 2 years ago

I tried running the test in the PR I linked with chunk_size 1, 10, 1000, 10000 and it succeeded for the latter two. But increasing the number of records from 10k to 50k made it segfault again for all chunk sizes. It might just get fast enough with a larger chunk size, but still fail when hitting the timeout.

warmwaffles commented 2 years ago

Okay, when I get some time today I'll dig into this more.

LostKobrakai commented 2 years ago

My PR should replicate the issue on the latest version. But yes I was confused why you reacted to the old thread as well :D

warmwaffles commented 2 years ago

Hah, yea completely forgot I made this ticket to begin with. I had a case where I had a ton of time series data stored and retrieving it was difficult.

elixir-sqlite / exqlite

When connection times out a SIGSEGV is encountered #127