AllenInstitute / ophys_etl_pipelines

Pipelines and modules for processing optical physiology data
Other
9 stars 5 forks source link

Implement solution to DockerHub limits #118

Closed wbwakeman closed 3 years ago

wbwakeman commented 3 years ago

Assuming that the organization has paid for the DockerHub account, implement credentials to take advantage of that, so that no builds run into a DockerHub limit issue.

https://www.docker.com/increase-rate-limits

Tasks:

Validation:

djkapner commented 3 years ago

We should now be paying customers for dockerhub. To test, I got an interactive session on a node and exhausted the free pulls by:

[danielk@n69 ~]$ for i in `seq 1 200`; do SINGULARITY_TMPDIR=/scratch/capacity/${PBS_JOBID} singularity run docker://alleninstitutepika/ophys_nway_matching:main python -c "import os; print(os.environ.get('NWAY_COMMIT_SHA', None))"; done
e6648dc92ab5217db82fa37e2ab1df2421550eb6
e6648dc92ab5217db82fa37e2ab1df2421550eb6
e6648dc92ab5217db82fa37e2ab1df2421550eb6
...

until I got this error:

FATAL:   Unable to handle docker://alleninstitutepika/ophys_nway_matching:main uri: failed to get SHA of docker://alleninstitutepika/ophys_nway_matching:main: Error reading manifest main in docker.io/alleninstitutepika/ophys_nway_matching: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Then, based on these docs I set the environment variables:

export SINGULARITY_DOCKER_USERNAME=<redacted>
export SINGULARITY_DOCKER_PASSWORD=<redacted>

After this, I was able to loop through the run command again successfully.

@wbwakeman Is setting environment variables for mongrel possible. If so, we can set them, as above, to the credentials stored in 1password. This will solve our production pulls problem with dockerhub for either ophys_etl or ophys_nway. If we have multiple docker credentials, mongrel could have env variables:

PIKA_DOCKER_USERNAME
PIKA_DOCKER_PASSWORD

in which case, we would modify the LIMS executables to be something like:

SINGULARITY_TMPDIR=/scratch/capacity/${PBS_JOBID} SINGULARITY_DOCKER_USERNAME=${PIKA_DOCKER_USERNAME} SINGULARITY_DOCKER_PASSWORD=${PIKA_DOCKER_PASSWORD} singularity run --bind /allen:/allen,/scratch/capacity/${PBS_JOBID}:/tmp docker://alleninstitutepika/ophys_etl_pipelines:main

@njmei please review above proposal. We could do something more involved with openssl, but, this should work. What do you think?

djkapner commented 3 years ago

submitted servicenow ticket AI_requests0059085 to try to get some advice/help from sysadmins and LIMS team

njmei commented 3 years ago

@djkapner Sorry, for the super delayed response. Didn't see this earlier. Yes, that looks fine to me.

djkapner commented 3 years ago

It looks like it might take some time to get those credentials in place via the Platform team. In the meantime, here is a temporary solution to run the latest nway in "production":

The existing OPHYS_NWAY_CELL_MATCHING_QUEUE has this executable (noting here so we can return to this later when docker credentials are set): TMPDIR=/scratch/capacity/${PBS_JOBID} singularity run docker://alleninstitutepika/ophys_nway_matching:main

In an interactive node, I pulled down the latest image in a similar way:

[danielk@n69 ~]$ SINGULARITY_TMPDIR=/scratch/capacity/${PBS_JOBID} singularity run docker://alleninstitutepika/ophys_nway_matching:main python -c "import os; print(os.environ.get('NWAY_COMMIT_SHA', None))"

<download and build .sif>

30ce5fe481a2c055757fc50bdfa0af44fa88de01

This commit hash is the latest commit to the nway cell matching main branch.

This created a cached .sif file which we can now call directly, getting docker out of the loop for the moment:

[danielk@n69 ~]$ SINGULARITY_TMPDIR=/scratch/capacity/${PBS_JOBID} singularity run /allen/ai/hpc/singularity/oci-tmp/9540c9abd3e3a2b8db66117173df82fe31403c06fec013172803a9af1200dac0/ophys_nway_matching_main.sif python -c "import os; print(os.environ.get('NWAY_COMMIT_SHA', None))"
30ce5fe481a2c055757fc50bdfa0af44fa88de01

NOTE: I think there is no difference between using SINGULARITY_TMPDIR and TMPDIR

So the temporary executable in LIMS should be: TMPDIR=/scratch/capacity/${PBS_JOBID} singularity run /allen/ai/hpc/singularity/oci-tmp/9540c9abd3e3a2b8db66117173df82fe31403c06fec013172803a9af1200dac0/ophys_nway_matching_main.sif

I have changed this. Note, I have not changed OPHYS_NWAY_CELL_MATCHING_REMAP_SPECIMENS_QUEUE though that could be done the same way, if necessary.

djkapner commented 3 years ago

Temporary solution in place. This breaks CI/CD, but, we are able to manually get the latest version in place. Permanent solution is in LIMS team hands, tracked here: http://jira.corp.alleninstitute.org/browse/PBS-2549

djkapner commented 3 years ago

Jose has developed a permanent fix for this where LIMS will submit the job like this:

qsub your_pbs_file.pbs -v "SINGULARITY_DOCKER_USERNAME=var1,SINGULARITY_DOCKER_PASSWORD=PASS"

I tested it, this works for passing the credentials in. Once implemented, I think I'll have to get the credentials into the secrets.yml file, or something. Will update ticket when I know.

wbwakeman commented 3 years ago

We now have a read-only Docker Hub account. Username is instituteci. Password is in 1password or ping @wbwakeman

wbwakeman commented 3 years ago

Singularity3.7 has been pushed out to the hpc nodes

djkapner commented 3 years ago

I am able to see the instituteci@alleninstitute.org in outlook online by: upper right corner, click on my icon ("DK") and "open another mailbox" and then typing in this email address. Then I can reply to the dockerhub validation email.

djkapner commented 3 years ago

I created an access token for this read-only user that Wayne had created. I can then use this token as a password in an executable call like:

SINGULARITY_DOCKER_USERNAME=instituteci SINGULARITY_DOCKER_PASSWORD=<token> SINGULARITY_TMPDIR=/scratch/capacity/${PBS_JOBID} singularity run docker://alleninstitutepika/ophys_nway_matching:main python -c "import os; print(os.environ.get('NWAY_COMMIT_SHA', None))"
djkapner commented 3 years ago

for croissant we'll handle this later when it comes up again

wbwakeman commented 3 years ago

Completed in conjunction with a paid DockerHub account and http://jira.corp.alleninstitute.org/browse/PBS-2549