DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

can we use pip install when running dsub? #262

Closed jqian2015 closed 1 year ago

jqian2015 commented 1 year ago

does this work with dsub? basically, I wonder if we can install some tools after launching image. thanks.

--image us.gcr.io/broad-dsp-gcr-public/terra-jupyter-aou:2.1.19 \ --command 'set -o errexit && \ set -o xtrace && \ pip install -U pytest && \ cp ${bam} ${bai}'

I had errors like this:

script: |-

!/usr/bin/env bash

set -o errexit && \
             set -o xtrace && \
            pip install -U pytest && \
              cp ${bam} ${bai}

script-name: set start-time: '2023-05-20 00:19:11.765508' status: FAILURE status-detail: | )')': /simple/pytest/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7362f1a2ea90>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pytest/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7362f1a2eed0>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pytest/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7362f1a36350>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pytest/ ERROR: Could not find a version that satisfies the requirement pytest (from versions: none) ERROR: No matching distribution found for pytest status-message: |- Stopped running "user-command": exit status 1: )')': /simple/pytest/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7362f1a2ea90>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pytest/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7362f1a2eed0>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pytest/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7362f1a36350>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/pytest/ ERROR: Could not find a version that satisfies the requirement pytest (from versions: none) ERROR: No matching distribution found for pytest

wnojopra commented 1 year ago

Hi @jqian2015 ,

Yeah, that should work. I just tried it myself on the terra-jupyter-aou image and pip install ran with no issue. dsub doesn't do anything specifically that would block pip installs. There are a few things that can potentially be causing this:

  1. Was pypi down when you tried this? If you run it again do you get the same errors?
  2. Are you using the --block-external-network option in your dsub command? If set to true, this prevents the container for the user's script/command from accessing the external network.
jqian2015 commented 1 year ago

hi Willy, thanks for the response. I tested again, and also tested without using dsub to make sure 1) pypi was not down; 2) --block-external-network was null.

my code is like this, can you spot anything wrong?

or would you mind sharing your code so I can test as well? thanks again.

aou_dsub \ --image us.gcr.io/broad-dsp-gcr-public/terra-jupyter-aou:2.1.19 \ --disk-size 512 \ --boot-disk-size 50 \ --logging "${WORKSPACE_BUCKET}/data/plink/logging" \ --input bam="${WORKSPACE_BUCKET}/data/readme.txt" \ --output bai="${WORKSPACE_BUCKET}/data/readme2.txt" \ --command 'set -o errexit && \ set -o xtrace && \ pip install -U pytest && \ cp ${bam} ${bai}'

wnojopra commented 1 year ago

So it sounds like you're running dsub on the AoU platform which may have some restriction on accessing an external network. Please reach out to support@researchallofus.org to help work through your dsub use case(s) on the AoU workbench. In particular, I think it'd be good to show your test without using dsub and your test in dsub.

And in case you were interested this is my exact dsub command that works (outside of Aou):

dsub \
  --provider google-cls-v2 \
  --project <MY_PROJECT> \
  --logging gs://<MY_BUCKET>/hello/ \
  --image us.gcr.io/broad-dsp-gcr-public/terra-jupyter-aou:2.1.19\
  --command 'set -o errexit && set -o xtrace && pip install -U pytest && echo hello' \
  --regions us-central1 \
  --boot-disk-size 100 \  # Because terra-jupyter-aou image is large
  --wait
jqian2015 commented 1 year ago

thank you so much , Willy. I am actually part of AOU user support team. This question was raised by one user so I had to test to see if I can replicate the error. BTW, do you know who is the best person in your team whom I can reach out to further to get the confirmation that using dsub does have this restriction on accessing out of AOU network? Because in the AOU, pip works fine without dsub, and felt like it doesn't make sense using dsub would have any limitation to use pip install. nonetheless, really appreciated your time and input.

wnojopra commented 1 year ago

I'm asking around to see who's capable of looking further into this. I will connect you once found.

Otherwise, I notice you're using the command aou_dsub. Is that a bash function? Are you able to share the code for aou_dsub?

jqian2015 commented 1 year ago

maybe should not use --network or --subnetwork?

function aou_dsub () {

  # Get a shorter username to leave more characters for the job name.
  local DSUB_USER_NAME="$(echo "${OWNER_EMAIL}" | cut -d@ -f1)"

  # For AoU RWB projects network name is "network".
  local AOU_NETWORK=network
  local AOU_SUBNETWORK=subnetwork

  dsub \
      --provider google-cls-v2 \
      --user-project "${GOOGLE_PROJECT}"\
      --project "${GOOGLE_PROJECT}"\
      --image 'marketplace.gcr.io/google/ubuntu1804:latest' \
      --network "${AOU_NETWORK}" \
      --subnetwork "${AOU_SUBNETWORK}" \
      --service-account "$(gcloud config get-value account)" \
      --user "${DSUB_USER_NAME}" \
      --regions us-central1 \
      --logging "${WORKSPACE_BUCKET}/dsub/logs/{job-name}/{user-id}/$(date +'%Y%m%d/%H%M%S')/{job-id}-{task-id}-{task-attempt}.log" \
      "$@"
}
wnojopra commented 1 year ago

@jqian2015 I've confirmed with other AoU folks that worker nodes in AoU do not have internet access. Can you please email me at willyn@google.com so I can follow up with you on next steps?

jqian2015 commented 1 year ago

emailed. thanks again. good to know this info. will respond to the user.