entralliance / entr_runtime

ENTR Alliance is an owner-operator led initiative to create open data stack within the clean energy sector. ENTR is a distribution of existing open-source tools, frameworks, and standards, packaged together to accelerate the transition to clean energy. Join us!
http://www.entralliance.com
MIT License
6 stars 3 forks source link

Feature/docker stability #12

Closed lewisarmistead closed 2 years ago

lewisarmistead commented 2 years ago

This PR refactors the Docker build. Some features implemented:

There's still quite a bit of work to do, but hopefully this gets us to a more stable foundation to allow further work.

lewisarmistead commented 2 years ago

@jordanperr - does the pattern work for you of modifying the .env file here to map to your clone of OpenOA? I'm open to any suggestions for a workflow to enable local development of Jupyter notebooks and dbt models - this iteration felt a bit cleaner to me, and it's similar to an implementation you had tried previously of cloning this repo in the same dir that OpenOA and entr_runtime exist. Note, you'll need to switch entr_warehouse to the branch with this PR https://github.com/entralliance/entr_warehouse/pull/2 for your dbt commands to work since this branch upgrades to dbt 1.0.

jordanperr commented 2 years ago

Hi lewis - After Charlie and our meeting last week, it looks like we're settled on using Docker Compose and also deleting Postgres from the image. After that, just some small changes:

  1. Please remove the "container_name" field from line 8 of the docker-compose yaml file. It will read the YAML key for the service and use that as the name, allowing use of the "-p" prefix flag.
  2. Please move the port numbers from docker-compose and the Dockerfile to environment variables. Perhaps "ENTR_PORT_SPARKUI" "ENTR_PORT_JUPYTER" and "ENTR_PORT_DBT"?
lewisarmistead commented 2 years ago

Hi lewis - After Charlie and our meeting last week, it looks like we're settled on using Docker Compose and also deleting Postgres from the image. After that, just some small changes:

1. Please remove the "container_name" field from line 8 of the docker-compose yaml file. It will read the YAML key for the service and use that as the name, allowing use of the "-p" prefix flag.

2. Please move the port numbers from docker-compose and the Dockerfile to environment variables. Perhaps "ENTR_PORT_SPARKUI" "ENTR_PORT_JUPYTER" and "ENTR_PORT_DBT"?

@jordanperr - I added variables to the .env file after speaking today but before I saw your comment here. The changes were made in this commit and don't align exactly with the names you proposed, but feel free to change them if you'd like

lewisarmistead commented 2 years ago

Follow-up note: the README hasn't been fully updated to reflect the most recent changes encapsulated in this PR

lewisarmistead commented 2 years ago

@jordanperr - I think I've got your notes incorporated. If satisfied, can you go ahead and merge this PR?

jordanperr commented 2 years ago

I see you have removed the submodules for OpenOA and for the entr_warehouse repositories. Is it assumed that the developer will be manually tracking these three separate repositories? It was intentional (and nice) to have the version/dependency of both components specified here in some way. In the first version of entr-runtime, these dependencies are specified in the shell scripts. In my branch (feature/10_29_2021_local), they are specified git submodules. How do you envision that dependency management happening now?

We could have entr_warehouse installed as a DBT package, and OpenOA installed from its pip package. But then we are limited to versions of these components that are in a package index.

Also, in the long term, we want OpenOA and entr_runtime to be copied into the image and pre-installed. Not mounted as a volume.

lewisarmistead commented 2 years ago

I see you have removed the submodules for OpenOA and for the entr_warehouse repositories. Is it assumed that the developer will be manually tracking these three separate repositories? It was intentional (and nice) to have the version/dependency of both components specified here in some way. In the first version of entr-runtime, these dependencies are specified in the shell scripts. In my branch (feature/10_29_2021_local), they are specified git submodules. How do you envision that dependency management happening now?

We could have entr_warehouse installed as a DBT package, and OpenOA installed from its pip package. But then we are limited to versions of these components that are in a package index.

Also, in the long term, we want OpenOA and entr_runtime to be copied into the image and pre-installed. Not mounted as a volume.

@jordanperr - thanks for the feedback! My changes here were just a start, and I agree that we'll want to have these package dependencies included in the build step once we need to enforce versioning, which I don't think we need to at this point.

I couldn't figure out a viable workflow for managing and committing changes to the submodules' repos/branches, e.g., adding a notebook in the OpenOA examples on a different OpenOA branch. I'm sure I don't fully understand how to use submodules, but I found it easier to develop on those repos by optionally volume-mounting them in Docker - it seemed like a simpler workflow not to have to map the submodules to the commit with the most recent changes in the other repos where you'd be independently managing commits/changes (I also had trouble getting pip install -e to work with the submodules) rather than being able to switch around branches in the other repos and commit changes made directly from there while also having those changes reflected in the runtime where you were working.

If you have a good development workflow involving submodules, I'd be happy to discuss. I was also having some trouble outlining thoughts in written format here, so it may just be better to discuss further on a call anyway.

jordanperr commented 2 years ago

Fair enough. I will merge this now and we can discuss further on today's call.