fly-apps / postgres-importer

9 stars 3 forks source link

Defaults are maybe a little confusing/dangerous? #5

Open JAForbes opened 11 months ago

JAForbes commented 11 months ago

Hey, thanks for the great project, just wanted to offer some feedback on the CLI defaults.

    noOwner := flag.Bool("no-owner", true, "")
    clean := flag.Bool("clean", true, "")
    create := flag.Bool("create", true, "")
    dataOnly := flag.Bool("data-only", false, "")

The default behaviour for pg_dump is to include ownership, so defaulting to --no-owner=true is a little surprising and can break RLS policies/security. Also supporting --no-owner but not --owner (for the reverse behavior also is a little confusing. It took me a while to realise that I could pass =value to override the default. As all CLI's have their own conventions, any surprises that can be avoided if possible as trial and error isn't great when doing prod migrations.

clean being defaulted to true is also a little dangerous. For example. what happens if a dev writes to the wrong database by mistake. Perhaps they reversed the target/source db uri accidentally (hey it happens!). In that case they are now potentially wiping production db objects despite not explicitly telling the tool to do so (because clean defaults to true).

I'd argue a tool like this should have no defaults. This is exactly the time when clarity should be prioritized over concision or convenience.

Final point, importing just a db is fine, as long as you also bring along db roles. To not bring along db roles and only database objects will only work for projects that use postgres as a simple store. It's a good practice for each service to have its own db role scoped to its level of access. It's also super simple to extract roles via pg_dumpall, so I'd recommend including roles as an option as --no-owner=false won't work if the owners don't exist.

davissp14 commented 11 months ago

The settings will be use-case dependent and they were set based on common hurdles many of our users were hitting when importing from services like Heroku. We are probably at a point where we can re-evaluate the defaults, but there's no shortage of trade-offs to consider.

In general, we want to have sane defaults that accommodate the majority of use-cases, while allowing users to tune things as needed when their use-case doesn't fit into the mold.

If you want hack on this

The pg import feature takes an image reference as an argument,, so you're welcome to customize this process however you see fit.


Imports database from a specified Postgres URI

Usage:
  flyctl postgres import <source-uri> [flags]

Flags:
  -a, --app string       Application name
      --clean            Drop database objects prior to creating them.
  -c, --config string    Path to application configuration file
      --create           Begin by creating the database itself and reconnecting to it. If --clean is also
                         specified, the script drops and recreates the target database before reconnecting to it.
                         (default true)
      --data-only        Dump only the data, not the schema (data definitions).
  -h, --help             help for import
      --image string     Path to public image containing custom migration process
      --no-owner         Do not set ownership of objects to match the original database. Makes dump restorable by
                         any user. (default true)
      --region string    Region to provision migration machine
      --vm-size string   the size of the VM

Global Flags:
  -t, --access-token string   Fly API Access Token
      --debug                 Print additional logs and traces
      --verbose               Verbose output```
JAForbes commented 11 months ago

common hurdles many of our users were hitting when importing from services like Heroku

Heroku didn't allow users to have superuser access to the db, so these users will likely not have roles. And consequently they would not want to retain ownership. But, that's a Heroku specific thing. Clearing ownership doesn't set those users up for success when they adopt the full postgres feature set that is availble to them on fly.

In general, we want to have sane defaults that accommodate the majority of use-cases, while allowing users to tune things as needed when their use-case doesn't fit into the mold.

An admirable approach, and normally I would agree. I like fly's CLI design in flyctl for this exact reason. But databases are different. If you want to retain the existing defaults, it would probably be a good idea to have some confirmation steps, wiping a db is a big deal. Also importing a large db with a lot of objects with the wrong ownership can be a pain to repair.

Also re: sane-defaults, the double negative of --no-owner=false is a little confusing right? What about:

--roles              Copy all roles, passwords and ownership from source database. 
                     If left unset, object ownership will default to the operator user. (default false)

The pg import feature takes an image reference as an argument,, so you're welcome to customize this process however you see fit.

Now that I've figured out how the tool works, this specific problem won't impact me personally, but I just wanted to offer some feedback that this could surprise others and potentially lead to problems.

davissp14 commented 11 months ago

Heroku didn't allow users to have superuser access to the db, so these users will likely not have roles. And consequently they would not want to retain ownership. But, that's a Heroku specific thing. Clearing ownership doesn't set those users up for success when they adopt the full postgres feature set that is availble to them on fly.

Ahh, so the vast majority of users who use Postgres don't actually know Postgres. They really just need a database that hooks up to their App and the rest is handled by their framework of choice. I'm not saying your wrong, fwiw, but this was written in a day as a prototype to streamline the import process for folks coming from Heroku-like services.

If you want to retain the existing defaults, it would probably be a good idea to have some confirmation steps, wiping a db is a big deal. Also importing a large db with a lot of objects with the wrong ownership can be a pain to repair.

As always, users should 100% test the import process against a staging/test environment before trying to import anything into their production database. This should help mitigate any dangers that could potentially come along with importing data.

The fly pg create <app-name> --fork-from <existing-db> should make testing this kind of thing pretty easy.

I do understand that it can be a pain to troubleshoot import issues, especially when you're working against a large dataset. If you have ideas on how to improve this process, a PR would be more than welcome. I'm not actively working on databases, so it's hard for me to prioritize anything on this front atm.

Also re: sane-defaults, the double negative of --no-owner=false is a little confusing right? What about: --roles Copy all roles, passwords and ownership from source database. If left unset, object ownership will default to the operator user. (default false)

The downside to this is that it forces us to maintain our own sets of documentation, which may or may change across major versions. It might be better to just push users to the pg_dump/pg_restore tools for guidance, which is where these options come from.

I would likely have a different viewpoint on this if Fly was a DBaaS company though, but this is not the case.

https://www.postgresql.org/docs/current/app-pgdump.html https://www.postgresql.org/docs/current/app-pgrestore.html

This is an open-source project though, so we will happily take contributions if folks are willing to maintain it.