IntersectMBO / cardano-db-sync

A component that follows the Cardano chain and stores blocks and transactions in PostgreSQL
Apache License 2.0
284 stars 158 forks source link

Missing `schema/` dir on the static binaries #1742

Open Cmdv opened 1 week ago

Cmdv commented 1 week ago

It was reported that:

I have not run a db-sync instance for a number of months (lack of disk space). I am trying to set one up now. I pulled the static binaries from https://github.com/IntersectMBO/cardano-db-sync/releases/tag/13.2.0.2 and am trying to get it running. Its currently complaining about lack of schema files. The schema files are not bundled with the static binaries. Do I really have to get them from the git repo at the right tag? (That actually seemed to work, but the general public should not be expected to do this)

A solution to this could be to simply add data-dir: schema to the cardano-db-sync cabal file.

This would need testing to see if that actually works

sgillespie commented 1 week ago

A solution to this could be to simply add data-dir: schema to the cardano-db-sync cabal file.

This would only help for cabal install.

I think that, combined with bundling the schema files in the static binary tarballs would be a better overall solution

sgillespie commented 1 week ago

Another option: https://hackage.haskell.org/package/file-embed

Cmdv commented 1 week ago

@sgillespie I looked at that package and was looking for something really simple but you're right file-embed might actually do the trick

Cmdv commented 6 days ago

Just been taking a look at that embedding files into the binary…. it’s not easy :sob:

So all the conversion of the files happens in cardano-db/ so we’d have to do the embedding there. Then the folder location we get at run time not compile time so command line option would have to be removed and the location of schema/ would need to always be the same.

What are the thoughts on that?

erikd commented 6 days ago

How is the tarball for the static binaries created? Is it not just a script?

Cmdv commented 5 days ago

We had a discussion about how to implement this:

Setup.hs

cardano-db-sync/Setup.hs deals with converting schema/* -> [KnownMigration] to the binary at build time. The ideas would be to add an extra field to the type:

data KnownMigration = KnownMigration"
  { hash :: !Text
  , filepath :: !Text
  , rawQuery : !Text <-- here
  } deriving (Eq, Show)

This would allow us to have the SQL query of each file available to us at build time thus they would be part of the binary.

Migration.hs

cardano-db/src/Cardano/Db/Migration.hs is where the migrations are ran at startup. Currently we us --schema-dir flag to determine where the schema folder is located. This is because it seems that we check that the hashes match with what we have in KnownMigration against the files in schema-dir.

What we could now do is not do this check by default as KnowMigration would be correct given they are added at compile time. The flag would instead be used to add custom migration files incase users want to add new or override the ones that come as part of the binary. This might require the addition of some configs as it was mentioned by Artur that they remove all Stage 4 migrations when testing. That part is to be determined how much flexibility we want there.

In this file we'd use schema-dir as a flag and change the code here:

-- when we have custom files
, "--file='" ++ location </> script ++ "'"
-- when using `KnownMigration` 
, "--command'" ++ rawQuery ++ "'"

It might turn out to be slightly more complicates than that if we decide to combine both custom migration files and ones inside of KnownMigration.

erikd commented 4 days ago

I ask again. How is the static binary tarball generated?

Is it not easier to modify the tarball generation rather than hack the code of the executable?

Cmdv commented 2 days ago

I ask again. How is the static binary tarball generated?

with nix I assume 🤷

sgillespie commented 1 day ago

I ask again. How is the static binary tarball generated?

Is it not easier to modify the tarball generation rather than hack the code of the executable?

It's all here: https://github.com/IntersectMBO/cardano-db-sync/blob/master/flake.nix#L304

Yes, we could add the schema files to the release tarball, and that would probably be easier. Adding them to the binary would probably be a little more full-proof, because it would solve a lot of other use-cases.