Migrate existing pieces to new pluggable component architecture, part 3 the rest of the birds and twitcher/magpie into one big component

bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform

https://birdhouse-deploy.readthedocs.io/en/latest/

Apache License 2.0

4 stars 6 forks source link

Migrate existing pieces to new pluggable component architecture, part 3 the rest of the birds and twitcher/magpie into one big component #215

Open tlvu opened 2 years ago

tlvu commented 2 years ago

There are interdependences between the birds and other components, ex:

postgres DB table needs to be created when a bird is activated
magpie provider/permission needs to be configured when a bird is activated

So in order to not solve this interdependencies right now, all the birds and magpie/twitcher and postgres will be in one big component for now.

tlvu commented 2 years ago

@dbyrns @huard @fmigneault @matprov updated issue description to create one big component to not have to deal with component dependencies, as per our discussion.

tlvu commented 2 years ago

@fmigneault currently Mongodb is used by Phoenix. Weaver also use the same Mongodb instance, which would pull Phoenix and Mongodb into this big component for Weaver to depend on.

Would you be open for the Weaver component to have its own Mongodb and then the current Mongodb and Phoenix can be moved into their own standalone component?

fmigneault commented 2 years ago

No. I would rather keep the same mongodb instance and have a note in the component README that says it is required when either Phoenix or Weaver are employed. We could have each component extend EXTRA_CONF_DIRS as needed with their own component dependencies in case they were omitted if we want to be more robust. Since data and tables are already being created in the same mongodb, they will not be easily migrated into distinct instances. Same issue goes for PostgreSQL.

tlvu commented 2 years ago

Since data and tables are already being created in the same mongodb, they will not be easily migrated into distinct instances.

You got a point here. To avoid complex migration procedure, I guess we are stuck with a fairly big "default" component.

dbyrns commented 2 years ago

But @tlvu, I was thinking that the solution proposed by @fmigneault was accepted. The one that every component is isolated but we propose a hand-crafted component list that we know is working. 100% back-compatible, but still allow new stack to emerge without Phoenix or without Weaver as long as if one of them is used Mongodb must be used.

tlvu commented 2 years ago

But @tlvu, I was thinking that the solution proposed by @fmigneault was accepted. The one that every component is isolated but we propose a hand-crafted component list that we know is working. 100% back-compatible, but still allow new stack to emerge without Phoenix or without Weaver as long as if one of them is used Mongodb must be used.

@dbyrns Yes, so this "hand-crafted component list" will basically include everything currently is deployed by default to keep 100% backward-compat.

Inside that "hand-crafted component list" there will be a subset that has to absolutely go together (all the birds + postgres + Magpie because Weaver and Magpie currently hardcode the list of birds and because all the birds have their existing data in the same postgres, breaking them out means postgres data migration for each bird). I was trying to make this subset as small as possible. So code wise, it is doable, but not for data migration as @fmigneault point out. So this subset will have to stay pretty large.

fmigneault commented 2 years ago

In my opinion, birds can be separated (eg: component/hummingbird, component/flyingpigeon as so on), it's just that all of those will require component/postgres. Similarly, component/phoenix and component/weaver will require component/mongodb. Weaver might need a small update to auto-populate the WPS birds list based on enabled components, but nothing more.

The "default setup" will include all currently active components. Users can then decide to override this default setup to remove some components as desired, but its up to them to make sure for example that component/postgres is still provided for birds that need it. This should be fairly easy to debug, since docker-compose would complain of missing link or depends service if a required component was omitted.

fmigneault commented 2 years ago

@tlvu I came across a use-case where I might like to have a more recent version of mongodb for Weaver (using 3.4 from available one on server is quite old, 5.0 is now available, 3.x is not even supported anymore...). So if you get around this task, you can consider renaming mongodb to phoenix-mongodb (left as is), and I would add a more recent weaver-mongodb.

mishaschwartz commented 1 year ago

each service should have its own user to query the database with limited table/collections accessible.

(from https://github.com/bird-house/birdhouse-deploy/pull/296#issuecomment-1441085287)

~The database in this case refers to the db containing the magpie data. We could even use magpie/twitcher itself to enforce these policies and provide this data through magpie's API~

fmigneault commented 1 year ago

magpie/twitcher itself to enforce these policies and provide this data through magpie's API

I don't think this is feasible. They are strongly expecting HTTP requests. Either way, I don't think we should mix the concepts of "platform users" and "service users". In https://github.com/bird-house/birdhouse-deploy/pull/296#issuecomment-1441085287, I referred to "users" in the sense of the postgresql/mongodb credentials to connect to the databases. What I would expect is that a query from e.g. finch doesn't allow reading magpie's database and vice-versa.

mishaschwartz commented 1 year ago

@fmigneault

I think I misunderstood your original comment. Multiple services should be able to access the shared database that hosts their own data not that multiple services should be able to access magpie's database. Is that correct?

In that case, I think that we should make a distinction between a database service and a database:

For example, a postgresql cluster is running as a service in a container but we can define multiple databases run by that service, each one accessible by a different postgres user. So magpie, finch, etc. can all use the same postgres service but then we have a database named "magpie" which is accessible by the user "magpie", the database "finch" which is accessible by the user "finch", etc.

Is that what you mean?

fmigneault commented 1 year ago

@mishaschwartz Yes, this is what I had in mind. Only one docker service, but each “bird” have their own database and user in it.