docker-archive / migrator

Tool to migrate Docker images from Docker Hub or v1 registry to a v2 registry
Apache License 2.0
160 stars 82 forks source link

REPO_FILTER is not working #89

Closed Brett55 closed 7 years ago

Brett55 commented 7 years ago

I have an image under my organization that I know exists. When I type the name in REPO_FILTER or part of the name, it returns zero results for migration. I'm able to find older images under my org so it is weird that some images are not found.

mbentley commented 7 years ago

Is this on Docker Hub that the image exists but migrator is not finding an image for with the REPO_FILTER in use? Could you provide the command you're using for reference? Feel free to sanitize whatever info you deem necessary but just the command should help to start with.

Brett55 commented 7 years ago

The image exists on DockerHub. When I use repo filter with exact names, it finds some repos but not all.

docker run -it     -e AWS_ACCESS_KEY_ID=KEY     -e AWS_SECRET_ACCESS_KEY=ACCESS_KEY     -v /var/run/docker.sock:/var/run/docker.sock     -e V1_REGISTRY=docker.io     -e V2_R

EGISTRY=1234567.dkr.ecr.us-east-1.amazonaws.com     -e DOCKER_HUB_ORG=tdgp  -e V1_USERNAME=user  -e V1_PASSWORD=password -e V1_EMAIL=user@gmail.com  -e V1_REPO_FILTER=lpsa  docker/migrator 
mbentley commented 7 years ago

Hmm, you can try to use the API directly to see if the repository shows up in the list:

DOCKER_HUB_USERNAME=""
DOCKER_HUB_PASSWORD=""
NAMESPACE="tdgp"
V1_REPO_FILTER="lpsa"

TOKEN=$(curl -sf -H "Content-Type: application/json" -X POST -d '{"username": "'${DOCKER_HUB_USERNAME}'", "password": "'${DOCKER_HUB_PASSWORD}'"}' https://hub.docker.com/v2/users/login/ | jq -r .token)

NAMESPACES=$(curl -sf -H "Authorization: JWT ${TOKEN}" https://hub.docker.com/v2/repositories/namespaces/ | jq -r '.namespaces|.[]')

REPO_LIST=$(curl -sf -H "Authorization: JWT ${TOKEN}" https://hub.docker.com/v2/repositories/${NAMESPACE}/?page_size=100000 | jq -r '.results|.[]|.name' | grep ${V1_REPO_FILTER} || true)

echo $REPO_LIST

You can also check to see if it returns the repo without the filter:

DOCKER_HUB_USERNAME=""
DOCKER_HUB_PASSWORD=""
NAMESPACE="tdgp"

TOKEN=$(curl -sf -H "Content-Type: application/json" -X POST -d '{"username": "'${DOCKER_HUB_USERNAME}'", "password": "'${DOCKER_HUB_PASSWORD}'"}' https://hub.docker.com/v2/users/login/ | jq -r .token)

NAMESPACES=$(curl -sf -H "Authorization: JWT ${TOKEN}" https://hub.docker.com/v2/repositories/namespaces/ | jq -r '.namespaces|.[]')

NF_REPO_LIST=$(curl -sf -H "Authorization: JWT ${TOKEN}" https://hub.docker.com/v2/repositories/${NAMESPACE}/?page_size=100000 | jq -r '.results|.[]|.name')

echo $NF_REPO_LIST

I grabbed the code directly from the script so these should be an accurate representation of what repositories it is expecting to find.

I'd be interested to see if it returns the repository or not.

Brett55 commented 7 years ago

I tested your script and it returns some repos, just like the docker migrator tool. For some reason it cannot find several of our repos. I checked settings and they are identical to the ones it CAN find.

mbentley commented 7 years ago

Is it possible that your user does not have specific access to those repositories? I just double checked and it looks like the API hasn't changed. And you can see them when you look for them at https://hub.docker.com/u/tdgp/dashboard/ when you're logged in as the same user account that you're using to query the repo list?

Brett55 commented 7 years ago

From https://hub.docker.com/u/tdgp/dashboard/ I cannot see all of our repos because it says 'showing 100 of 100' when we have 250 repos. So in order to edit the settings of a repo, I need to enter the full path to the repo like so: 'https://hub.docker.com/r/tdgp/myRepo/

If I use the search, it returns zero results as well. Which I've raised bugs about in the past

mbentley commented 7 years ago

I see. It almost sounds like there is some sort of hard coded limit in the API that is preventing the listing of all repos since migrator reuses the Docker Hub APIs. Let me see what I can find out.

Brett55 commented 7 years ago

Yea it definitely seems like there is a limit. Thanks

Brett55 commented 7 years ago

I can confirm that the repos the API cannot find are outside of the 100 limit. Anything under the 100 limit is discoverable. It seems like the API sorts descending by # of pull requests, so if a repo is hardly used it gets chopped off with the 100 limit.

Brett55 commented 7 years ago

I'm almost positive the issue is that docker migrator does not use paging to get the results.

https://github.com/docker/migrator/blame/master/migrator.sh#L382

mbentley commented 7 years ago

So yes, apparently page_size has a max of 100 which is why ?page_size=100000 doesn't actually work.

Brett55 commented 7 years ago

Will there be a fix for this soon, or can I fix it and have it rolled in this week?

mbentley commented 7 years ago

Working on it right now. Just needs a while loop to go through the pagination and it should be set. I have a working example for you to test:

DOCKER_HUB_USERNAME=""
DOCKER_HUB_PASSWORD=""
NAMESPACE=""

TOKEN=$(curl -sf -H "Content-Type: application/json" -X POST -d '{"username": "'${DOCKER_HUB_USERNAME}'", "password": "'${DOCKER_HUB_PASSWORD}'"}' https://hub.docker.com/v2/users/login/ | jq -r .token)

NAMESPACES=$(curl -sf -H "Authorization: JWT ${TOKEN}" https://hub.docker.com/v2/repositories/namespaces/ | jq -r '.namespaces|.[]')

PAGE_URL="https://hub.docker.com/v2/repositories/${NAMESPACE}/?page=1&page_size=10"

# loop through pages
while [ "${PAGE_URL}" != "null" ]
do
  # get a list of repos on this page
  PAGE_DATA=$(curl -sf -H "Authorization: JWT ${TOKEN}" "${PAGE_URL}")

  # figure out next page URL
  PAGE_URL="$(echo $PAGE_DATA  | jq -r .next)"

  # Add repos to the list
  NF_REPO_LIST="${NF_REPO_LIST} $(echo ${PAGE_DATA} | jq -r '.results|.[]|.name')"
done

echo $NF_REPO_LIST
echo $NF_REPO_LIST | wc -w
mbentley commented 7 years ago

@Brett55 - please let me know if the latest build fixes the issue.

Brett55 commented 7 years ago

@mbentley working now, thanks!!

Brett55 commented 7 years ago

@mbentley Is there a way to filter by image tag in addition to repo name?

mbentley commented 7 years ago

It's definitely possible - I'd have to think about how to define the filters since I'd assume that filters would need to be done per repo. There would need to be some sort of syntax to be able to indicate a repo + the tag to filter. It'd probably end up here: https://github.com/docker/migrator/blob/master/migrator.sh#L423-L433

Doing a generic tag filter (like only migrate latest) would be simple - just add another grep like done here https://github.com/docker/migrator/blob/master/migrator.sh#L409-L410

Brett55 commented 7 years ago

Okay, so currently just filtering by repo name. Could it be something like V1_REPO_FILTER, and V1_REPO_TAG_FILTER, then it just filters the results captured by V1_REPO_FILTER

mbentley commented 7 years ago

Yeah, so if you'd find a tag filter that applies to all repos useful, that'd be easy to implement.

Brett55 commented 7 years ago

Yea we need a tag filter for sure, in fact its required for what we're doing

Brett55 commented 7 years ago

@mbentley I don't want to duplicate efforts, are you adding the tag filter or can I start on a PR?

mbentley commented 7 years ago

I probably won't have time for a day or two so if you're able to, that'd be great.

Brett55 commented 7 years ago

@mbentley can you give me permissions to open a PR please?

mbentley commented 7 years ago

Did you fork this repo to your own account? Anyone should be able to submit a PR.

Brett55 commented 7 years ago

@mbentley yes sorry, opened PR