Refactor config and repo connect logic in CLI layer

ashmrtn commented 1 year ago

many CLI commands use helper functions like AccountConnectAndWriteRepoConfig to get a repo handle. However, part of this logic also creates a graph client and configures the concurrency limiter. A connection to graph is not required for all operations (e.x. repo maintenance). Furthermore, giving an invalid value to AccountConnectAndWriteConfig results in an error. This forces unrelated functions to specify a valid service to easily get a repo handle even though they don't need all the results from the call

We should refactor the CLI layer to make it easier to

get a repo connection
get and write config values
connect to graph as needed

HiteshRepo commented 10 months ago

Regarding refactoring of config, I believe we should streamline config management. I am proposing code skeleton mentioned below. Let me know if issues can be forseen.

we can make use of validations to get independent of external/utility functions like requireProps similarly we can move away from individual ApplyOverrides and manage the logic at a single place

package config

import (
    "io"
)

// Define flags
// var (
//  configFileFlag  string
//  bucketFlag string
// )
// flag.StringVar(&configFileFlag, "--config-file", "", ".toml config file")
// flag.StringVar(&bucketFlag, "--bucket", "", "S3 bucket name")

type RepoConfig struct {
    StorageConfig StorageConfig
    AccountConfig AccountConfig
    RepoID        string
    RepoUser      string
    RepoHost      string
}

func (repoCnf *RepoConfig) Validate() error {
    if err := repoCnf.StorageConfig.Validate(); err != nil {
        return err
    }

    if err := repoCnf.AccountConfig.Validate(); err != nil {
        return err
    }

    return nil
}

func ProvideRepoConfig() (RepoConfig, error) {
    // reads/parses config file flag `--config-file` as well as other flags that are defined
    // opens and reads the content of config file
    // invokes LoadRepoConfig
    return RepoConfig{}, nil
}

func LoadRepoConfig(reader io.Reader) (RepoConfig, error) {
    // initiallizes viper
    // viper.AutomaticEnv()
    // viper.SetConfigType("toml")

    // creates a map of environment variables and its corresponding struct path and stores in a variable say `keysToEnvironmentVariables`
    // keysToEnvironmentVariables := map[string]string{
    //  "Storage.S3Config.AccessKey":       "AWS_ACCESS_KEY_ID",
    //  "Account.M365Config.AzureClientID": "AZURE_CLIENT_ID",
    // }

    // bind `keysToEnvironmentVariables` to viper instance

    // reads `reader`

    // unmarshalls to RepoConfig instance

    // Override configuration values with flag values
    // if bucketFlag != "" {
    //  RepoConfigInstance.StorageConfig.S3Config.Bucket = bucketFlag
    // }

    // [Question]: We can handle environment variables, flags and config file up untill here. Are there any other sources?

    return RepoConfig{}, nil
}

func WriteRepoConfig(configFilePath string, cnf RepoConfig) error {
        // generate content from `cnf`
        // write to file
    return nil
}

// This can reside in storage package, does not have to reside in config package
type StorageConfig struct {
    S3Config         S3Config
    FileSystemConfig FileSystemConfig
}

func (storageCnf *StorageConfig) Validate() error {
    if err := storageCnf.S3Config.Validate(); err != nil {
        return err
    }

    if err := storageCnf.FileSystemConfig.Validate(); err != nil {
        return err
    }

    return nil
}

// This can reside in account package, does not have to reside in config package
type AccountConfig struct {
    M365Config M365Config
}

func (accountCnf *AccountConfig) Validate() error {
    return accountCnf.M365Config.Validate()
}

// This can reside in storage package, does not have to reside in config package
type S3Config struct {
    AccessKey       string
    SecretKey       string
    SessionToken    string
    Bucket          string
    Endpoint        string
    Prefix          string
    DoNotUseTLS     bool
    DoNotVerifyTLS  bool
    SessionTags     map[string]string
    Role            string
    SessionName     string
    SessionDuration string
}

func (s3cnf *S3Config) Validate() error {
    return nil
}

// This can reside in storage package, does not have to reside in config package
type FileSystemConfig struct {
    Path string
}

func (fscnf *FileSystemConfig) Validate() error {
    return nil
}

// This can reside in account package, does not have to reside in config package
type M365Config struct {
    AzureClientID     string
    AzureClientSecret string
    AzureTenantID     string
}

func (m365cnf *M365Config) Validate() error {
    return nil
}

// We can load config as the application starts
// We can then pass respective configs from RepoConfig
// [HELP NEEDED], identify locations where we need to pass configs

ashmrtn commented 10 months ago

taking a stab at some of the questions you posted:

Config info is generally passed around as the account.Account struct and the storage.Storage struct. There's a few places that's done. The big usage is getting a handle to the repository struct here. The other usages of these structs will be in some of our tools and tests

The tools are much less structured in how they get config info and will probably be the most difficult to tackle. They can be found in this directory in the codebase

I think tests exclusively use functions in this file for account config and this file for storage config

Unfortunately, there's also a lot of M365 ID values hard-coded into the codebase for running integration tests. Those have been handy to make things work for us out of the box, but I'm not sure what folks' thoughts are on them in the long run

alcionai / corso

Refactor config and repo connect logic in CLI layer #4239