Open ashmrtn opened 1 year ago
Regarding refactoring of config, I believe we should streamline config management. I am proposing code skeleton mentioned below. Let me know if issues can be forseen.
we can make use of validations to get independent of external/utility functions like requireProps
similarly we can move away from individual ApplyOverrides
and manage the logic at a single place
package config
import (
"io"
)
// Define flags
// var (
// configFileFlag string
// bucketFlag string
// )
// flag.StringVar(&configFileFlag, "--config-file", "", ".toml config file")
// flag.StringVar(&bucketFlag, "--bucket", "", "S3 bucket name")
type RepoConfig struct {
StorageConfig StorageConfig
AccountConfig AccountConfig
RepoID string
RepoUser string
RepoHost string
}
func (repoCnf *RepoConfig) Validate() error {
if err := repoCnf.StorageConfig.Validate(); err != nil {
return err
}
if err := repoCnf.AccountConfig.Validate(); err != nil {
return err
}
return nil
}
func ProvideRepoConfig() (RepoConfig, error) {
// reads/parses config file flag `--config-file` as well as other flags that are defined
// opens and reads the content of config file
// invokes LoadRepoConfig
return RepoConfig{}, nil
}
func LoadRepoConfig(reader io.Reader) (RepoConfig, error) {
// initiallizes viper
// viper.AutomaticEnv()
// viper.SetConfigType("toml")
// creates a map of environment variables and its corresponding struct path and stores in a variable say `keysToEnvironmentVariables`
// keysToEnvironmentVariables := map[string]string{
// "Storage.S3Config.AccessKey": "AWS_ACCESS_KEY_ID",
// "Account.M365Config.AzureClientID": "AZURE_CLIENT_ID",
// }
// bind `keysToEnvironmentVariables` to viper instance
// reads `reader`
// unmarshalls to RepoConfig instance
// Override configuration values with flag values
// if bucketFlag != "" {
// RepoConfigInstance.StorageConfig.S3Config.Bucket = bucketFlag
// }
// [Question]: We can handle environment variables, flags and config file up untill here. Are there any other sources?
return RepoConfig{}, nil
}
func WriteRepoConfig(configFilePath string, cnf RepoConfig) error {
// generate content from `cnf`
// write to file
return nil
}
// This can reside in storage package, does not have to reside in config package
type StorageConfig struct {
S3Config S3Config
FileSystemConfig FileSystemConfig
}
func (storageCnf *StorageConfig) Validate() error {
if err := storageCnf.S3Config.Validate(); err != nil {
return err
}
if err := storageCnf.FileSystemConfig.Validate(); err != nil {
return err
}
return nil
}
// This can reside in account package, does not have to reside in config package
type AccountConfig struct {
M365Config M365Config
}
func (accountCnf *AccountConfig) Validate() error {
return accountCnf.M365Config.Validate()
}
// This can reside in storage package, does not have to reside in config package
type S3Config struct {
AccessKey string
SecretKey string
SessionToken string
Bucket string
Endpoint string
Prefix string
DoNotUseTLS bool
DoNotVerifyTLS bool
SessionTags map[string]string
Role string
SessionName string
SessionDuration string
}
func (s3cnf *S3Config) Validate() error {
return nil
}
// This can reside in storage package, does not have to reside in config package
type FileSystemConfig struct {
Path string
}
func (fscnf *FileSystemConfig) Validate() error {
return nil
}
// This can reside in account package, does not have to reside in config package
type M365Config struct {
AzureClientID string
AzureClientSecret string
AzureTenantID string
}
func (m365cnf *M365Config) Validate() error {
return nil
}
// We can load config as the application starts
// We can then pass respective configs from RepoConfig
// [HELP NEEDED], identify locations where we need to pass configs
taking a stab at some of the questions you posted:
Config info is generally passed around as the account.Account
struct and the storage.Storage
struct. There's a few places that's done. The big usage is getting a handle to the repository struct here. The other usages of these structs will be in some of our tools and tests
The tools are much less structured in how they get config info and will probably be the most difficult to tackle. They can be found in this directory in the codebase
I think tests exclusively use functions in this file for account config and this file for storage config
Unfortunately, there's also a lot of M365 ID values hard-coded into the codebase for running integration tests. Those have been handy to make things work for us out of the box, but I'm not sure what folks' thoughts are on them in the long run
many CLI commands use helper functions like
AccountConnectAndWriteRepoConfig
to get a repo handle. However, part of this logic also creates a graph client and configures the concurrency limiter. A connection to graph is not required for all operations (e.x. repo maintenance). Furthermore, giving an invalid value toAccountConnectAndWriteConfig
results in an error. This forces unrelated functions to specify a valid service to easily get a repo handle even though they don't need all the results from the callWe should refactor the CLI layer to make it easier to