Closed chongr closed 2 years ago
I want to be careful about this task - we shouldn't create a centralised "settings" struct that's passed into every constructor. Rather, only inject specific dependencies that the constructor uses.
I recently re-did hedwig-go like so, take a look at that.
Additionally, we should not need a config file just read from env vars since config file would mean berglas and the likes which we don't want to impose. So perhaps a config struct that's created in main.go and then pick fields from that struct for every constructor as objects are constructed.
ah okay yeah I was just looking at hedwig go - was planning to just set up something similar to this? https://github.com/cloudchacho/hedwig-go/blob/e5ab05b5a6ca8c5f2f82955464c919cb2ba2c4c1/gcp/gcp.go#L265
Yep so that links to the settings specific for gcp backend. And it's a struct rather than separate parameters since that's easier to update in the future. I think we shouldn't need a struct here since the app is going to be a deployable and the "external API" is just env vars. So we can get away with just parameters everywhere (assuming it doesn't balloon to like 5 params 😛).
If it's more than like 3 config parameters definitely a struct.
ah okay so maybe we don't need a struct and just always read from ENV for this info?
oh but I guess we still need similir to how its set up in hedwig GO for consuming from the queue since it should be cloud agnostic
Well ideally the only thing that reads from env is the entry point or something close to that - but still only in one place. So a config struct might be okay and should be the only thing that reads from env. But a function or set of functions in main.go might work also.
Also yes we'll want to make it cloud agnostic side we'd have to provide env vars for configuring gcp as well. Would that lead to too many vars and maybe a file is better just to keep things in one place?
I was thinking we could pass all the configurations as command-line arguments. We can inject those environment variables with defaults using [Shell Parameter Expansion](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html. Or may be pass a file path via command-line args to load the configs from
settings set up in https://github.com/cloudchacho/hedwig-firehose/pull/9, but https://github.com/cloudchacho/hedwig-firehose/issues/4#issuecomment-1094698945 will still have to be addressed as to how they are added on start up
Subscribtions
Metadata Bucket (only name, no path)
Staging Bucket (only name, no path)
Output Bucket (name and path)
“Every configurable number of minutes, wake up, read all the worker’s files that exist”
“Write to a temporary path every configurable duration (say 30 seconds), or until a configurable content size is reached (say 5MB)”