adrianmrozo / ds

Data Science Toolkits and Architectures group Ludwig & Adrian
2 stars 1 forks source link

Build docker image #4

Closed kraftl-UL closed 3 years ago

kraftl-UL commented 3 years ago

Howdy guys,

I'm still struggeling with dockerizing the code. Please find my questions under the respective files. I'm glad if you can help me with at least one topic. If you have tackled one of those subjects in your project report, please let me know. I'd gladly check it out.

As I understand the topic we have to build 4 files all of which need to be stored inside the folder, where our modularized code is stored, too:

With the instructions from the doc.hub file we can build the files accordingly.

1.) requirements.txt

Let's start with the simplest one. Later for our code to work, we simply need to insert all packages that we require for executing our code. Here it is:

flask
redis

Well, not so simple as I went on. Installing cv2 imposes to be problematic. I tried installing it inside the requirements file with version 4.10.0. It is a side package from opencv-python. However, it is not enough to only install opencv-python, which works easily. I also tired commenting it out. Then it works at first, but then there's an error when I want to build the container out of the image. Always the error:

from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

I tried importing the cv2 in the modules via from cv2 import * and import cv2. I also tried importing cv2 inside the Dockerfile via RUN pip install cv2. No success yet.

However, at the moment my requirement file looks like this:

numpy == 1.18.5
opencv-python == 4.4.0.44
imutils == 0.5.3
#cv2 == 4.4.0.44
keras == 2.4.3
tensorflow == 2.3.1
h5py == 2.10.0

2.) Dockerfile

The docker file from the start more or less looks like this:

FROM python:3.8.3
WORKDIR /code
ENV FLASK_APP=app.py
ENV FLASK_RUN_HOST=0.0.0.0
RUN apk add --no-cache gcc musl-dev linux-headers
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
EXPOSE 5000
COPY . .
CMD ["flask", "run"]

Question Set 1:

My current Dockerfile looks like this:

FROM python:3.8.5

WORKDIR /usr/src/app

COPY main.py .
COPY load_and_test.py .
COPY output.py .
COPY prep_cifar10.py .
COPY set_initials.py .
COPY shape_model.py .
COPY test.py .
COPY training.py .
COPY requirements.txt .

RUN pip install --upgrade pip
RUN pip3 install --no-cache-dir -r requirements.txt

CMD ["python3", "./main.py"]

3.) docker-compose.yml

The docker-compose file looks like this:

version: "3.8"
services:
  web:
    build: .
    ports:
      - "5000:5000"
    volumes:
      - .:/code
    environment:
      FLASK_ENV: development
  redis:
    image: "redis:alpine" ´´

Question Set 2:

4.) app.py

And a supposedly easy task for the end. The file consists of the following code:

import time
import redis
from flask import Flask
app = Flask(__name__)
cache = redis.Redis(host='redis', port=6379)

def get_hit_count():
    retries = 5
    while True:
        try:
            return cache.incr('hits')
        except redis.exceptions.ConnectionError as exc:
            if retries == 0:
                raise exc
            retries -= 1
            time.sleep(0.5)

@app.route('/')
def hello():
    count = get_hit_count()
    return 'Hello World! I have been seen {} times.\n'.format(count)

This is the python code that we want to execute inside our docker container. To build our own container we only need to replace this file by our own code.

You can also find the discussed files zipped here.

I do look forward to hearing from you.

Happy coding and greez,

Ludwig

Benemrxr commented 3 years ago

I'm afraid I can't help you out all to much right now, but for your first question set this documentation site might be helpful: docs.docker.com/engine/reference/builder/.

Is the EXPOSE parameter defining the port number?

Yes, as I understand from the documentation.

Hope this is somewhat helpful :v:

habichta commented 3 years ago

Just to clarify the task. From skimming over your issue. You are not required to make your code work within flask and redis. This was just meant as a first tutorial on docker-compose Flask etc. will come later in the course. Your Code should first run independently in a Docker container (as of milestone 2). In milestone 3 you are asked to make your code run in docker-compose together with a PostgreSQL Database. Flask or Redis do not enter the equation yet

habichta commented 3 years ago

Question Set 1:

FLASK_RUN_HOST defines under which IP address Flask will accessible. 0.0.0.0 is a special meta-address, it has multiple meanings. In the context of servers (Flask builds on Werkzeug, which contains a simple webserver for development purposes only) it means that all traffic is routed to it (in case a server has multiple IP addresses). Long story short, leave it to 0.0.0.0

RUN is a directive for a Dockerfile to run a specific shell command.

EXPOSE says which port needs to be exposed such that your app within the container can be accessed from the outside. However, it does not really do anything it is documentation purposes only. The Actual exposing happens in docker-compose or when you run the Docker image using docker run -p 5000:5000 ...

CMD is the actual command you want to run within a Docker container. the difference to RUN is that RUN creates a new "layer" in the docker image and CMD can be overwritten when you do docker run ...

Yes. If you use Flask, you need to tell it on on which Ip address it should listen. leave it ti 0.0.0.0. Since you are using Flask you will probably have an app.py somewhere. This defines "routes" which you can open as URL in your Webbrowser. Accessing that route will kick of execution of a function

@app.route('/')
def hello():
    count = get_hit_count()
    return 'Hello World! I have been seen {} times.\n'.format(count)

means when you open localhost:5000/ in your browser, function hello() will be executed. If you want to start your training, you could create a route @app.route('/training') And put you keras training code there.

If you do not want to use Flask (which I seriously recommend at this stage. Since it is not part of any milestone so far). Your Dockerfile you have above is not too bad but CMD ["python3", "./main.py"] strikes me to be a bit weird. Maybe CMD ["python3", "main.py"]? Also you are upgrading pip but using pip3. Not quite sure if that makes sense. But it might (would have to check)

-Is the EXPOSE parameter defining the port number? Yes. Flask by default uses port 5000. So you need to expose that port to outside of the container. But mentioned above, EXPOSE will do nothing. It is just documentation for developers

Redis is a high-performance in-memory store. Used for non-persistent (mostly) data structures. In the example you copied, it is used to count how many times the page has been visited (on route ' \ '). This is not necessary at all for this milestone (as well as flask).

If you want to build from your own Dockerfile. There is a syntax for that in docker-compose

 your_service_name:
    build: .

Note that in docker-compose the service name (like "redis:" or "web:") in your example can be named by you. within python code, you can use that name as a hostname. F.e. if you want to connect to a database and you called the service "database:" you can use the host name string "database" in your python code to refer to the IP address that Docker assigns the Docker container which runs that service.

build: . means "Find the Dockerfile in the same directory as the docker-compose.yml file (meaning of the dot) and build it. Use the built image. It is the same as "image:" but the latter will try to search on DockerHub for an image called for example "redis:alpine" in the case of your "redis:" service. It will fist look in your local docker image registry. So you could first build your Dockerfile separately using docker build .. And use the ID of the built image for the image:. However, I do not recommend you do that, because then you always have to change that ID when you rebuilt the image. You can also give the image a fixed name like "my_image:latest". Then you could use "image: my_image:latest". The problem with that is, that you have to manually rebuild the image on all the machines you are using, because the image only exists on your local machine, not on DockerHub.

you could then upload (docker push) that image to DockerHub. Then it would be universally accessible for everybody like the "redis:alpine" image. But I would recommend you just use the build . syntax I highlighted above

kraftl-UL commented 3 years ago

Thank you, that's great help.

vitorkrasniqi commented 3 years ago

Hi there :D

Maybe I don't quite understand your question (maybe because it is late and I only had 2 coffees today)

But I am trying to help here.

As Arthur mentioned above, I think we simply have to be inspired by the first example. Just run it and try to understand what is going on, or at least I have tried.

So now that we have understood the basics of exercise 1, we can try to do number 2 using this example.

So I would recommend creating a YALM file that has the following structure:

Version: "3.7"

( Verion 2.X is core docker-compose, 3 stands for Docker-Swarm = advantages (Scalable on one or more servers. Whereby as Docker-compose your web application simply runs on a single Docker host. Docker Swarm and related subcommands like Docker Swarm and Docker Stack are integrated into the Docker-CLI itself. They are all part of the Docker binary that you call from your terminal. Docker-Compose is an independent binary in and of itself. (therefore more difficult to learn) )

then declare your database structure:

So it could look like this: Version: "3.7"

services: db: Picture: postgres:12.4 Restart: always environment: POSTGRES_DB: Postgres POSTGRES_USER: Administration POSTGRES_PASSWORD: Password1234 PGDATA: /var/lib/postgresql/data volumes:

And then save it and execute this command: sudo docker-compose up -d

Then check your addresses:

ip-a

and with this information you should be able to see the IP address used by my Docker host, connect it via a Python by using packages like sqldf, Pandas,py-postgresql, PyGreSQL, ocpgdb, bpgsql, SQLAlchemy or anything else.

And then you use some SQL commands to create a table and so on.

I'm still trying to figure things out, but I think I'll find an SQL cheatsheet where we can take a look at the most important commands.

I hope I could help :D

adrianmrozo commented 3 years ago

Thank you very much Vitor, very much appreciated !

habichta commented 3 years ago

Just to be clear: