Improve docker support - Githubissues

rothnic commented 1 year ago

The core change of this PR is the move the app data outside of the app folder. This is best practice for git projects and is essential for using docker. By moving data outside of the app folder, we can much more easily update the app, whether it is ran directly or with docker.

Changes:

Adds dockerfile to the project, which could then automatically build on each commit to the project
Makes the app default to using moving data outside the project folder in ~/.cassandra by default
Makes the entire app execution leverage input options, so that we can debug it, change where the data is stored, change the port, or ip address, or change the log level output without editing app.py
Updates assets folder to better handle how app.py is executed
Updates mqtt connection to avoid connection issues when debugging

Closes #42 and #57

EinEinfach commented 1 year ago

Do we need a how to what to do with own data after update to this version?

rothnic commented 1 year ago

Do we need a how to what to do with own data after update to this version?

So, what it does is copies the files from /src/data, to ~/.cassandra on startup. So, if you were to update the app like you normally do, it should work. However, they will need to know that the files are now located in that folder, or how to change that to a different location.

The one thing missing from this PR is updates to the readme. I was kind of unsure how to update it, given it is in german at the moment. I'd be happy to update the sections needing updating in english if that works.

rothnic commented 1 year ago

Some examples of executing the app:

Run the app with default configuration: python app.py start

See the input options for starting the app:

> python app.py start --help
Usage: app.py start [OPTIONS]

  Start the CaSSAndRA Server

  Only some Dash server options are handled as command-line options. All other
  options should use environment variables. Find supported environment
  variables here: https://dash.plotly.com/reference#app.run

Options:
  -h, --host TEXT                 [default: 0.0.0.0]
  -p, --port INTEGER              [default: 8050]
  --proxy TEXT                    format={{input}}::{{output}} example=http://
                                  0.0.0.0:8050::https://my.domain.com
  --data_path TEXT                [default: /Users/nroth/.cassandra]
  --debug                         Enables debug mode for dash application
  --app_log_level [DEBUG|INFO|WARN|ERROR|CRITICAL]
                                  [default: DEBUG]
  --app_log_file_level [DEBUG|INFO|WARN|ERROR|CRITICAL]
                                  [default: DEBUG]
  --server_log_level [DEBUG|INFO|WARN|ERROR|CRITICAL]
                                  [default: ERROR]
  --pil_log_level [DEBUG|INFO|WARN|ERROR|CRITICAL]
                                  [default: WARN]
  --help                          Show this message and exit.

Build docker image: docker build . -t cassandra

Run the docker image (simple example, ok for macos which handles file permissions through docker for desktop): docker run -it -v /Users/nroth/.cassandra:/home/cassandra/.cassandra cassandra start --help docker run -it -v /Users/nroth/.cassandra:/home/cassandra/.cassandra cassandra start

Run the docker image (with user id mapping, needed for linux machines)

export HOST_UID=$(id -u)
export PGID=$(id -g)
docker run -it --rm -e HOST_UID=$HOST_UID -e HOST_GID=$HOST_GID -v /Users/nroth/.cassandra:/home/cassandra/.cassandra cassandra start

rothnic commented 1 year ago

Just realized that I forgot to set the default log level for the cassandra.log file to the same level as before. Going to fix that and push that change in.

EinEinfach commented 1 year ago

Some questions about the changes:

After merging, app can be only start with "start" attribute, correct? If so, we need definitely a README update before merging
How can I start the app in VSCode? Only from cmd line with start attribute, play button in VSCode leads to default print out, correct? How can I use VSCode built in debug mode?
I didn't unterstand what happended with data directroy is that automaticly moved to new location? (/home/user/.cassandra/data...) ---> Edit: I think, I know what happens. The magic is in /src/backend/data/utils.py. Does it mean, I can remove /src/data directory from repository?
I know at least one user is using multiple instances of cassandra on one machine, what will happen then? Should he start cassandra with directory and port attributes for second instance? ./app.py --data_path '/home/...' --port XXXX

If we make small change in app.py: if __name__ == "__main__": start() the behavior of cassandra is more familiar. And app.py --help is still working. And I can use build in VSCode debug. Why do yo use cli() function?

rothnic commented 1 year ago

After merging, app can be only start with "start" attribute, correct? If so, we need definitely a README update before merging

Yeah, so I set it up that way originally in case we had other commands we wanted to add in the future, or to create other variations of the start command. For example, start_debug could be setup that handles passing in some of the values you'd want when debugging. Or, if we wanted to run a test suite, that could be handled with python app.py test.

How can I start the app in VSCode? Only from cmd line with start attribute, play button in VSCode leads to default print out, correct? How can I use VSCode built in debug mode?

I held back on committing my vscode commands, but that might be useful. Here is my launch.json file that I could commit:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "App",
            "type": "python",
            "request": "launch",
            "program": "app.py",
            "args": ["start"],
            "console": "integratedTerminal",
            "justMyCode": true,
            "cwd": "${workspaceFolder}/CaSSAndRA/",
            "python": "${command:python.interpreterPath}"
        },
        {
            "name": "Debug App",
            "type": "python",
            "request": "launch",
            "program": "app.py",
            "args": ["start", "--debug"],
            "console": "integratedTerminal",
            "justMyCode": true,
            "cwd": "${workspaceFolder}/CaSSAndRA/",
            "python": "${command:python.interpreterPath}"
        },
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "justMyCode": true,
            "python": "${command:python.interpreterPath}"
        }
    ]
}

I didn't unterstand what happended with data directroy is that automaticly moved to new location? (/home/user/.cassandra/data...) ---> Edit: I think, I know what happens. The magic is in /src/backend/data/utils.py. Does it mean, I can remove /src/data directory from repository?

Correct, .cassandra is now the data directory. However, /src/data does still need to be there for now because it is essentially the initial data for the .cassandra directory. Without it, a brand new run won't have the data required (i think). I'm not sure if the app depends on that initial data being there.

I know at least one user is using multiple instances of cassandra on one machine, what will happen then? Should he start cassandra with directory and port attributes for second instance? ./app.py --data_path '/home/...' --port XXXX

Correct. Instead of having to sync two instances of the cassandra git project. They would sync one project, then point the second execution to another directory with another port.

rothnic commented 1 year ago

BTW, I am good with the changes you made to just default directly into starting the app for now until we need some other command. Just wanted to explain the thought behind how it was setup. If you want to just default to starting the app when you do python app.py, I can update the readme with some examples to describe different ways of executing the app, including an example running two instances of it.

rothnic commented 1 year ago

After thinking about this a bit more, I do think I could improve the command-line output when the user first runs the app. So, we could check to see if data_path exists when you run python app.py, then if it doesn't exist, output some information to the command line to tell the user what is going to happen. This could also give them the chance to change the data_path from the default setting if they want.

Something like this (pseudocode):

if (data_path is the default) and (data_path doesn't exist):
    # we know this is a first-time start up
    # tell the user "You are starting cassandra for the first time. We will copy the initial data files from /src/data to ~/.cassandra, where all settings, logs, and runtime data will be stored". 
    response = prompt_user("Do you want to continue starting the server using {data_path}?")
    # click would collect the response of yes or no from the command line at this point
   if response:
       # user said they wanted to continue, so we continue starting the server
   else:
       # do not start the server
       # tell the user how they can see the options for starting the server and how they can pass in an alternate data_path
else:
      # do nothing because we know we have started the server before

rothnic commented 1 year ago

@EinEinfach I updated the readme, made the suggested changes, then added some command line output for people that might just update and run the new version without looking at the docs. Here is what the output looks like if you just run python app.py without having ~/.cassandra. Afterwards, it will startup like normal.

EinEinfach commented 1 year ago

thx

EinEinfach commented 1 year ago

file_paths is also needed in backendserver.py in stop() function. Should file_paths be a kind of global variable?

rothnic commented 1 year ago

file_paths is also needed in backendserver.py in stop() function. Should file_paths be a kind of global variable?

I think backendserver in the end should be a class, but I was trying to avoid doing major refactoring. Essentially, we should create a BackendServer(filepaths=filepaths) or something along those lines, then there would be start/stop methods on the class. That would provide kind of a global access to the filepaths object for the backendserver, without it being a true global variable, which is something I think most people try to avoid.

EinEinfach commented 1 year ago

I was trying to avoid doing major refactoring.

Yes, it will be a major refactoring. But at some point we have to change that, maybe not in the next months, but time will come. I fixed stop() function for the moment. It wasn't to difficult, file_paths was already a global variable.

By the way, I have a question in forum. How to do now a cassandra update if the app is running in docker container

rothnic commented 1 year ago

By the way, I have a question in forum. How to do now a cassandra update if the app is running in docker container

There are instructions in the readme that should still apply, even if coming from a previous version. The difference is just where their host directory maps into the container (~/home/cassandra/.cassandra).

The next step will be to setup automatic builds with tagging of versions, so many people won't ever have to checkout the git repo ever, unless they are developing the app. That is more complicated now that dockerhub no longer does that automatically for free, but it looks like GitHub workflows can orchestrate that.

EinEinfach / CaSSAndRA

Improve docker support #70