Open JAForbes opened 11 months ago
The settings will be use-case dependent and they were set based on common hurdles many of our users were hitting when importing from services like Heroku. We are probably at a point where we can re-evaluate the defaults, but there's no shortage of trade-offs to consider.
In general, we want to have sane defaults that accommodate the majority of use-cases, while allowing users to tune things as needed when their use-case doesn't fit into the mold.
If you want hack on this
The pg import feature takes an image reference as an argument,, so you're welcome to customize this process however you see fit.
Imports database from a specified Postgres URI
Usage:
flyctl postgres import <source-uri> [flags]
Flags:
-a, --app string Application name
--clean Drop database objects prior to creating them.
-c, --config string Path to application configuration file
--create Begin by creating the database itself and reconnecting to it. If --clean is also
specified, the script drops and recreates the target database before reconnecting to it.
(default true)
--data-only Dump only the data, not the schema (data definitions).
-h, --help help for import
--image string Path to public image containing custom migration process
--no-owner Do not set ownership of objects to match the original database. Makes dump restorable by
any user. (default true)
--region string Region to provision migration machine
--vm-size string the size of the VM
Global Flags:
-t, --access-token string Fly API Access Token
--debug Print additional logs and traces
--verbose Verbose output```
common hurdles many of our users were hitting when importing from services like Heroku
Heroku didn't allow users to have superuser access to the db, so these users will likely not have roles. And consequently they would not want to retain ownership. But, that's a Heroku specific thing. Clearing ownership doesn't set those users up for success when they adopt the full postgres feature set that is availble to them on fly.
In general, we want to have sane defaults that accommodate the majority of use-cases, while allowing users to tune things as needed when their use-case doesn't fit into the mold.
An admirable approach, and normally I would agree. I like fly's CLI design in flyctl for this exact reason. But databases are different. If you want to retain the existing defaults, it would probably be a good idea to have some confirmation steps, wiping a db is a big deal. Also importing a large db with a lot of objects with the wrong ownership can be a pain to repair.
Also re: sane-defaults, the double negative of --no-owner=false
is a little confusing right? What about:
--roles Copy all roles, passwords and ownership from source database.
If left unset, object ownership will default to the operator user. (default false)
The pg import feature takes an image reference as an argument,, so you're welcome to customize this process however you see fit.
Now that I've figured out how the tool works, this specific problem won't impact me personally, but I just wanted to offer some feedback that this could surprise others and potentially lead to problems.
Heroku didn't allow users to have superuser access to the db, so these users will likely not have roles. And consequently they would not want to retain ownership. But, that's a Heroku specific thing. Clearing ownership doesn't set those users up for success when they adopt the full postgres feature set that is availble to them on fly.
Ahh, so the vast majority of users who use Postgres don't actually know Postgres. They really just need a database that hooks up to their App and the rest is handled by their framework of choice. I'm not saying your wrong, fwiw, but this was written in a day as a prototype to streamline the import process for folks coming from Heroku-like services.
If you want to retain the existing defaults, it would probably be a good idea to have some confirmation steps, wiping a db is a big deal. Also importing a large db with a lot of objects with the wrong ownership can be a pain to repair.
As always, users should 100% test the import process against a staging/test environment before trying to import anything into their production database. This should help mitigate any dangers that could potentially come along with importing data.
The fly pg create <app-name> --fork-from <existing-db>
should make testing this kind of thing pretty easy.
I do understand that it can be a pain to troubleshoot import issues, especially when you're working against a large dataset. If you have ideas on how to improve this process, a PR would be more than welcome. I'm not actively working on databases, so it's hard for me to prioritize anything on this front atm.
Also re: sane-defaults, the double negative of --no-owner=false is a little confusing right? What about: --roles Copy all roles, passwords and ownership from source database. If left unset, object ownership will default to the operator user. (default false)
The downside to this is that it forces us to maintain our own sets of documentation, which may or may change across major versions. It might be better to just push users to the pg_dump
/pg_restore
tools for guidance, which is where these options come from.
I would likely have a different viewpoint on this if Fly was a DBaaS company though, but this is not the case.
https://www.postgresql.org/docs/current/app-pgdump.html https://www.postgresql.org/docs/current/app-pgrestore.html
This is an open-source project though, so we will happily take contributions if folks are willing to maintain it.
Hey, thanks for the great project, just wanted to offer some feedback on the CLI defaults.
The default behaviour for pg_dump is to include ownership, so defaulting to
--no-owner=true
is a little surprising and can break RLS policies/security. Also supporting--no-owner
but not--owner
(for the reverse behavior also is a little confusing. It took me a while to realise that I could pass=value
to override the default. As all CLI's have their own conventions, any surprises that can be avoided if possible as trial and error isn't great when doing prod migrations.clean
being defaulted to true is also a little dangerous. For example. what happens if a dev writes to the wrong database by mistake. Perhaps they reversed the target/source db uri accidentally (hey it happens!). In that case they are now potentially wiping production db objects despite not explicitly telling the tool to do so (because clean defaults to true).I'd argue a tool like this should have no defaults. This is exactly the time when clarity should be prioritized over concision or convenience.
Final point, importing just a db is fine, as long as you also bring along db roles. To not bring along db roles and only database objects will only work for projects that use postgres as a simple store. It's a good practice for each service to have its own db role scoped to its level of access. It's also super simple to extract roles via pg_dumpall, so I'd recommend including roles as an option as
--no-owner=false
won't work if the owners don't exist.