Qovery / Replibyte

Seed your development database with real data ⚡️
https://www.replibyte.com
GNU General Public License v3.0
4.14k stars 127 forks source link

Create `replibyte debug` command #159

Open wtait1-ff opened 2 years ago

wtait1-ff commented 2 years ago

Proposal

It seems to be a common theme that when bug reports / help issues are filed, the author is asked for certain info like

It would be helpful for people both authoring + triaging issues if a user could run a command like replibyte debug that would collect all that useful info automatically. Then they would just need to copy that output and past it in the issue.

Other thoughts

evoxmusic commented 2 years ago

It makes total sense to me and will help troubleshoot issues and even build a reproducible environment. Do you think you can propose a PR for this?

wtait1-ff commented 2 years ago

I'm happy to give it a go yes 👍

Oh also I forgot to mention this in the original issue, but see as the config file obviously contains sensitive data like database credentials, cloud account credentials

  1. this command would have to some fields in the config file to anonymize before creating the debug output
  2. since replibyte specializes in anonymizing data, it would be if the existing transformers can be re-used (I might need a bit of guidance on that part)
evoxmusic commented 2 years ago

💯 , you can inspire yourself (or even re-use) this part from telemetry.rs that is anonymizing sensitive data from conf.yaml.

evoxmusic commented 2 years ago

Hi @wtait1-ff , let me know if you need any help 👍🏽

thomasgouveia commented 1 year ago

Hi @evoxmusic,

It seems that there has been no activity on this issue for a while, and I'm interested to work on it for my first contribution to the project. Is it possible? If so, can we agree with the information that the debug command should return to the caller?

In my mind, to avoid a big PR, it would be better to split the issue in two more atomic issues :

In a very simple way, the debug command can display something similar to the caller:

replibyte debug

# Output

Replibyte $VERSION (running on $OS/$ARCH)
config :
  # here the config file with all sensitive data redacted
  $CONFIG

I think we should at least display the 4 following informations to help reproduce a bug :

What do you think about it ?

Thanks !

evoxmusic commented 1 year ago

Hi @thomasgouveia , thank you for your help. I'm happy to speak with you to see how we can add this because it would be super helpful. I am working on adding some benchmarking to Replibyte to improve the overall performance, which could also be added to the debug part in some way.

The $CONFIG element is one of the most important elements often missing to debug better. Something that would also be helpful is to provide a stack trace of what happened - like a profiling file but without the data.. Do you have any idea here?

thomasgouveia commented 1 year ago

Happy to help! For sure, it makes sense to do something that is relevant to help debugging issues or to provide context for an issue.

Just to be sure we agree on what we want to achieve, at first, the idea was to provide a replibyte debug command that will output to the user information about the environment where replibyte is executed. I think this command should be independent and simply output something like I gave in my previous message.

You said that it would be nice to have a kind of "profiling" file with a stack trace, I totally agree with you for that because it is clearly better to analysis the software behavior. I suppose that in your idea, you want to be able to trace what is done through the execution when executing any of the commands, for example a dump. If so, for me it sounds more like a --debug (or --verbose) global flag that will provide additional context (whether if there is an error or not during the command execution). We could probably bake something with tracing crates (or creating our own) to register sort of events at different code levels, and at the end of the execution, we can create the profiling file based on the events, with all the additional data such as the version of replibyte, the OS/arch etc.

Do you agree with that? Do you have any idea for this profiling file? I will be pleased to give it a try!

Another point, about the redact of the configuration. I saw that you use transformers in telemetry.rs to hide sensitive data that comes from the configuration file. I have another approach for that, using traits. I tried to do something with the current configuration :

trait Redact {
    fn redact(&self) -> Self;
}

This trait will need to be implemented by each configuration related structs, so that way, each block can redact their sensitive data :

impl Redact for Config {
    fn redact(&self) -> Self {
        // We create a mutable deep copy of the current element
        // as we don't want to alter our base configuration
        let mut copy = self.clone();

        if copy.encryption_key.is_some() {
            copy.encryption_key = Some(REDACTED.to_string());
        }

        if let Some(source) = copy.source {
            copy.source = Some(source.redact())
        }

        if let Some(destination) = copy.destination {
            copy.destination = Some(destination.redact())
        }

        copy.datastore = match copy.datastore {
            DatastoreConfig::AWS(cfg) => DatastoreConfig::AWS(cfg.redact()),
            DatastoreConfig::GCP(cfg) => DatastoreConfig::GCP(cfg.redact()),
            // We don't need to redact this piece of configuration, it does not contain any sensitive information
            DatastoreConfig::LocalDisk(cfg) => DatastoreConfig::LocalDisk(cfg)
        };

        copy
    }
}

At the end, we can simply call config.redact() to get a copy of our configuration, that is wiped from all sensitive data. We can also easily test each block independently to check that our logic of hiding sensitive data is ok. Is it ok for you?