bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform
https://birdhouse-deploy.readthedocs.io/en/latest/
Apache License 2.0
4 stars 6 forks source link

:bug: [BUG]: Regression handling `DATA_PERSIST_ROOT` setting #270

Closed fmigneault closed 1 year ago

fmigneault commented 1 year ago

Summary

Regression in handling DATA_PERSIST_ROOT setting does not produce the intended effect if anything other than defaults are used.

Details

Regression introduced in https://github.com/bird-house/birdhouse-deploy/commit/d4f2f4401b6535c7401a170c0c0794e2a19e479a regarding the use of DATA_PERSIST_ROOT.

New variable MAGPIE_PERSIST_DIR ~(there might be others doing similar things in other commits [?] )~ have been defined in default.env making use of the (up to that point) default value of DATA_PERSIST_ROOT=/data.

edit @fmigneault I can confirm that all modified directories for jupyter images, geoserver data and magpie's postgres and misbehaving in the data direcotory paths using a quick local test. I personnally use DATA_PERSIST_ROOT=/data/daccs and they have all been generated as /data/magpie_persists, /data/geoserver and /data/jupyterhub_user_data instead of their usual location nested under /data/daccs.

If the user defined instead their own DATA_PERSIST_ROOT in the env.local for an alternate storage location, all those derived variables do not consider the new value because they are already set.

This produces inconsistent directory storage locations and does not respect the expected behavior defined by the user.

To Reproduce

Steps to reproduce the behavior:

  1. Define DATA_PERSIST_ROOT=/my-data
  2. pavics-compose up -d

The volumes that make direct reference to ${DATA_PERSIST_ROOT} in docker-compose,yml (and component overrides) will have the correct path. When using derived directory variables, they will still be under /data/....

Environment

Information Value
Server/Platform URL all that define DATA_PERSIST_ROOT with another value
Version Tag/Commit since 1.19.1
Related issues/PR https://github.com/bird-house/birdhouse-deploy/pull/249
Related components magpie, postgres-magpie
Custom configuration any with custom DATA_PERSIST_ROOT and no explicit MAGPIE_PERSIST_DIR

Concerned Organizations

@matprov @tlvu

tlvu commented 1 year ago

In this case, I can only see 2 solutions:

1- keep DATA_PERSIST_ROOT but then have to remove GEOSERVER_DATA_DIR, JUPYTERHUB_USER_DATA_DIR, MAGPIE_PERSIST_DIR and just manually repeat their values as $DATA_PERSIST_ROOT/magpie_persist and so on with the risk of copy/paste typo as we keep retyping the same string

2- remove DATA_PERSIST_ROOT and keep the other 3, which opens the door to other apps to also put their data elsewhere without copy/paste error

In all cases we can document that to "move" /data, put a symlink on disk to where you want, avoid using the variable to move /data because it is error prone.

I have a préférence for option 2, what do you guys think?

fmigneault commented 1 year ago

From these propositions, I personally prefer option 1 over 2 (though not optimal) because it preserves the original behavior (before the bug was introduced). The current changes unexpectedly broke my setup, and I don't want this to happen every time a new variable is added. Users should not have to worry about always adding new variables to support new features, as it makes feature integration and migration harder for them. When adding new variables, omitting them should preserve the original behavior.

I do not like the symlink fix approach. The target /data directory in my case happens to have a lot of other projects. It cannot be hacked around just for the needs of birdhouse. For the same reason, option 2 is a no-go for me, because I must be able to provide an alternate location.

I have 2 possible alternative solutions to propose:

using a post.env

There could be a post.env file that simply defines variables that depend on another variable in this fashion:

# post.env
export MAGPIE_PERSIST_DIR=${MAGPIE_PERSIST_DIR:-${DATA_PERSIST_DIR}/magpie_persist}

Basically, they define similar items to defaults.env only if not predefined by some env.local override. The default.env would be reserved only for the case of hard-coded values (like it previously was).

The drawback is that it makes "defaults" lookup harder because there are 2 files to look for. It might not be intuitive, and we must remember to define the variables in the right location based on context.

using eval and specific quotes syntax

Defining the following files:

# defaults.env
export TEST1=1
export TEST2=2
export TEST3='test-$TEST1-$TEST2'
# local.env
export TEST2=222
# pseudo-pavics-compose.sh ; simulates operations defined in pavics-compose.sh
. /tmp/daccs-test/defaults.env

echo Before [local.env]
echo TEST1: [${TEST1}]
echo TEST2: [${TEST2}]
echo TEST3: [${TEST3}]

. /tmp/daccs-test/local.env

export TEST1=$(eval "echo $TEST1")
export TEST2=$(eval "echo $TEST2")
export TEST3=$(eval "echo $TEST3")

echo Loaded [local.env]
echo TEST1: [${TEST1}]
echo TEST2: [${TEST2}]
echo TEST3: [${TEST3}]

Produces:

❯ ./pseudo-pavics-compose.sh
Before [local.env]
TEST1: [1]
TEST2: [2]
TEST3: [test-$TEST1-$TEST2]
Loaded [local.env]
TEST1: [1]
TEST2: [222]
TEST3: [test-1-222]

Essentially, defaults.env "looks and feels" the same, though the use of single quotes must be explicit to avoid interpretation of the nested variables at that point. The values of defaults.env are evaluated only after loading.

I think that would be a fair approach because defaults.env reads more naturally, and the single quote notation is already used in this manner (https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/pavics-compose.sh#L10-L79) to load them only when needed (https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/pavics-compose.sh#L111-L112).

A "warning" note could be added to defaults.env to indicate the use of those single quotes to allow this mechanism to take effect.