NOAA-GSL / VxIngest

Other
2 stars 0 forks source link

Create a reproducible database schema #263

Open ian-noaa opened 11 months ago

ian-noaa commented 11 months ago

Problem

We need to be able to "cold start" the Couchbase database for new installations where we can't just replicate our production database - like new cloud environments, CI, and local Developer testing.

Solution

We should have a set of Python scripts that create our required schema in Couchbase. Additionally, it'd be nice to have a minimal amount of data (10's of MB) so that we could use it for local development, testing MATS, testing our ingest, etc...

We may need to include indices, and etc...

No Go's

Describe any features or behaviors that have been considered and rejected as out of scope for this project.

Tasks

Describe discrete tasks that need to be done to complete this project

ian-noaa commented 11 months ago

@randytpierce had some questions about using Python for this. We should explore if we can do this with plain N1QL queries.

I think that having a thin Python wrapper around the N1QL query will make this easier to do in the long run. Ideally, developers can create a config.yaml to point to the Couchbase they want to set up and then run something like poetry run setupDB to configure the database with the required bare-bones schema.

Since the DB contents are really just JSON files, a database backup might also be sufficient. If we were to pursue that approach, I wouldn't want to add more than 10's of MB to the repo with the JSON files though.

gopa-noaa commented 11 months ago

I will research if a baseline DB dump with minimal data would include index information. If so, this is probably all we would need for a cold start.

randytpierce commented 11 months ago

I just happened to check. This is the link https://docs.couchbase.com/server/current/backup-restore/cbbackupmgr-restore.html

It clearly does do the indexes by default.. " By default all data, index definitions, view definitions and full-text index definitions are restored to the cluster unless specified otherwise in the repos backup config or through command line parameters when running the restore command." I actually knew the right place to look since I was reading it earlier.

randy

On Wed, Nov 29, 2023 at 12:34 PM Gopa @.***> wrote:

I will research if a baseline DB dump with minimal data would include index information. If so, this is probably all we would need for a cold start.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/263#issuecomment-1832580712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDVQPQ7GW6NFAL2DJ6LTTDYG6E4RAVCNFSM6AAAAAA76H5VJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZSGU4DANZRGI . You are receiving this because you were mentioned.Message ID: @.***>

-- Randy Pierce

gopa-noaa commented 11 months ago

@randytpierce could you cut&paste your index definition code location here please.

randytpierce commented 11 months ago

This is the index creation code. index_creation_scripts. These CREATE statements appear to be hand generated but there is a way to capture them all (I have to look that up).

bonnystrong commented 11 months ago

Is this related to issue #263? Should it be a new issue?

On Wed, Nov 29, 2023 at 5:36 PM randytpierce @.***> wrote:

So I copied the archives over to adb-cb1 and reimported them. There are no errors but the data did not get into the database. So the import is not working appropriately.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/263#issuecomment-1832919309, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG6HZOQJUJAV5GOP5FHFCLLYG7IHDAVCNFSM6AAAAAA76H5VJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZSHEYTSMZQHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bonny Strong NOAA/GSL and CIRA home: (970) 669-1188 or office: (719) 301-6195 DSRC office 2B147

randytpierce commented 11 months ago

That issue became a sort of “stream of consciousness” that recorded how to go about diagnosing a failure between the ingest processing and the import process. It’s really important to be able to discern if the problem is the ingest processing or the import processing and that issue records how to do it. I probably should make a document about this. (And ruin some of the fun :) ) Randy

On Fri, Dec 1, 2023 at 5:55 PM bonnystrong @.***> wrote:

Is this related to issue #263? Should it be a new issue?

On Wed, Nov 29, 2023 at 5:36 PM randytpierce @.***> wrote:

So I copied the archives over to adb-cb1 and reimported them. There are no errors but the data did not get into the database. So the import is not working appropriately.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/263#issuecomment-1832919309,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AG6HZOQJUJAV5GOP5FHFCLLYG7IHDAVCNFSM6AAAAAA76H5VJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZSHEYTSMZQHE>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bonny Strong NOAA/GSL and CIRA home: (970) 669-1188 or office: (719) 301-6195 DSRC office 2B147

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GSL/VxIngest/issues/263#issuecomment-1836967716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDVQPX4HO2CIFRI3ST5VFTYHJ363AVCNFSM6AAAAAA76H5VJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZWHE3DONZRGY . You are receiving this because you were mentioned.Message ID: @.***>

ian-noaa commented 11 months ago

@bonnystrong a lot of the email comments were intended for #264 and accidentally got added to this issue. I'd recommend checking this issue out on GitHub as Randy cleaned the history up when he shifted those comments over.

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 90 days with no activity.