exasol / ai-lab

Development environment for data science developers
MIT License
3 stars 0 forks source link

Investigate reuse strategies for docker image #59

Closed ckunki closed 1 month ago

ckunki commented 11 months ago

Potential use cases

UC-1

UC-2

UC-3

ckunki commented 11 months ago

Using a stripped playbook I measured a total duration of 35 seconds. The main runtime (> 30 s) is used for copying ~ 1MB files into the Docker container.

When using the full playbook the

ckunki commented 11 months ago

Comments from @tkilias

ckunki commented 11 months ago

See https://stackoverflow.com/questions/27985334/why-is-copying-a-directory-with-ansible-so-slow: TLDR: use synchronize instead of copy.

ckunki commented 11 months ago

See also https://docs.pytest.org/en/7.4.x/how-to/capture-stdout-stderr.html

def test_disabling_capturing(capsys):
    print("this output is captured")
    with capsys.disabled():
        print("output not captured, going directly to sys.stdout")
    print("this output is also captured")

You tried to access the function scoped fixture capsys with a session scoped request object, involved factories:

ckunki commented 11 months ago

I used

- name: Copy notebook content
  ansible.builtin.synchronize:
    src: "roles/jupyter/files/notebook/"
    dest: /root/notebooks
    rsync_opts:
      - "--chmod=0644"

And got error message

protocol version mismatch -- is your shell clean? (see the rsync man page for an explanation) rsyncerror: protocol incompatibility (code 2) at compat.c(178) [sender=3.1.3]

After adding rsync to the Docker Container and replacing ansible.builtin.copy by ansible.builtin.synchronize the duration decreased from ~ 24 seconds to < 1 second! Great! :tada:

Unfortunately installing rsync itself takes 31 seconds: :slightly_frowning_face:

ckunki commented 11 months ago

Capturing stdout in @pytest.fixture(scope="session") only works then pytest is called with -o log_cli=true -o log_cli_level=INFO.

But when these cli options are provided capturing is not required anymore, as pytest will log ansible output anyway.

ckunki commented 11 months ago

For reusing a Docker Image in pytest DSS added support for CLI option --dss-docker-image in https://github.com/exasol/data-science-sandbox/issues/69