Open dotloadmovie opened 1 year ago
The following options come to mind:
#!/bin/bash
DB_PASSWORD=$(aws ssm get-parameter --name /myapp/DB_PASSWORD --with-decryption --query "Parameter.Value" --output text)
export DB_PASSWORD
# You can now use $DB_PASSWORD as an environment variable in your application
The advantage of this is that it uses something that could be replicated locally and in development environments pretty easily using .env
files.
The disadvantage is that the above bash script would need to be run on the code server whenever it spins up so that it can link those values to the right environment variables though simpler methods may exist.
If we have A JSON or YAML or whatever structure for configurations in a secure S3 bucket or even as an AWS parameter, we can then instruct dagster to load this value first before any processing is done. It can then set these values in a way that the rest of the pipeline can use.
The advantage of this is that this puts the file into more easily version-controlled and audited territory. We can run checks on this more easily and there isn't much added complexity to implementation
The disadvantage of this is that this would likely mean that this would not be able to use the environment to define this and some changes may need to be made to make sure the values are passed throughout the pipeline once the configuration is loaded which may make development more complex.
London has in principle accepted the concept. Conceptually, people uploading and file and giving instructions could be seen to different things.. it is managed by the interface. As the user interface stands, the upload IS the instruction (TR) true for multiple pipelines? When it gets more complex, it will be separate but for right now it is one and the same. Not sure what that means for future interface dev. Could have a bunch of pipelines queued up and the only thing stopping the user from running pipeline on data that should not be run through the pipeline is the user interface.. and it feels like it shoudl be more than that Arguably we already have this... one set of data, 2 pipelines.... and user has to explicitly tell us if they want one or other or both run?
Environmental variable: could be an instruction from the LA/ London Councils listing X LA and Y data run under Z pipelines etc etc. We want to site security that the instructions sit with London Councils and it lines up with the DSA/ processing doc. Excel file if we have to (to get the list).
We have a pipeline that splits already; 903 - PanAgg to London Councils 903 - further descoped PanAgg to CA
Task: Spike on effort to assess what would need doing, does it work with the DSA and how and when we will do it. Schema for how they provide the instructions. MH has already seen his draft... and it just needs the finer details. And a specific change management process. there are grey areas re specificity of instructions... MH to RAG rate the bits we know and a concrete proposal to react to and MH to tag the folks to input So that we could consider it for next sprint (one after this one 21st Feb)
Matthew can deploy an EV in the client instance but that is not scalable.
Requirements need to be completed by MH and Co. then split to urgent and non urgent dev
I think that for any requirement we need someone who understands the users and the business need to take responsibility for requirement definition.Especially because this is often iterative.
I sugegst this to avoid situations where we cant move forward quickly because the requirement is unclear and it isnt clear wither who has the info to fix that, and the autority to decide that this version is the thing to develop a solution against.
And I think Michael is that right person to do that here, so I have taken myself off of the issue.
When pan-agg has been created; run pipeline processes based on processing instructions file, using @asset decorators to load the data from the csv and import into @op functions
Priority tasks within this:
@patrick-troy As discussed, this is now being handled entirely through yaml files within the pipeline. There are two points in the flow where this intervention is made: cleanfile
use case processing using la-agg files This is where the config specific to a use case is applied. This can therefore correspond directly with the instructions passed to us from the processor
We'll manage this through a documented change management process that will contain all of the information required here. So our job each time there is a change accepted from the processor will be to take that information and amend the config files based on that info.
@cyramic and @patrick-troy to deploy config files as json rather than yaml
The easiest way to expose configuration options to the platform is through the use of
environment variables
. Working backwards from that, how could a given configuration best be version controlled and pushed to the dagster pipelines?