edgedb / edgedb-cli

The EdgeDB CLI
https://www.edgedb.com/docs/cli/index
Apache License 2.0
165 stars 23 forks source link

Allow to dump data only #1111

Open extsoft opened 1 year ago

extsoft commented 1 year ago

There is an edgedb dump command that does a full dump of the database(s). edgedb restore command could take a generated dump file(s) and apply them to the empty database. However, there is no way to dump (or restore) data only.

What did I try?

  1. A *.dump file provides the EdgeQL code for the schema, however, keeps data in binary format - no way to apply those data.
  2. https://www.edgedb.com/docs/changelog/3_x#sql-support states the SQL support. However, using pg_dump gives a relation representation of the data that is (let’s say) impossible to apply to an EdgeDB instance.

As a solution, I propose to add --data-only option to the edgedb dump command that allows getting EdgeQL data insert queries.

There is also the related question: why does edgedb dump keeps data in the binary form? As far as I see, a —format binary|edgeql option will make sense to provide a consistent output regardless of what’s being dumped: all, data only, etc.

KaelWD commented 10 months ago

I've tried generating dumps like this myself but EdgeDB doesn't seem to be able to handle .edgeql files with thousands of inserts. Separate queries like

insert SomeObject {
  title := '...',
  body := '...',
};
insert SomeObject {
  title := '...',
  body := '...',
};

takes about a second each which is way too slow, and a couple hundred MB of JSON in

with objects := to_json("[...]"),
for object in json_array_unpack(objects) union (
  insert SomeObject {
    title := <str>object['title'],
    body := <str>object['body'],
  }
)

makes the python process sit at like 10% cpu for a few minutes until the server runs out of memory and crashes.

tomnz commented 7 months ago

I know this hasn't seen any movement, but putting in a +1 for --format=edgeql!