docker-archive / docker-registry

This is **DEPRECATED**! Please go to https://github.com/docker/distribution
Apache License 2.0
2.88k stars 876 forks source link

500 status code when crawling /v1/search #1094

Open supreme opened 7 years ago

supreme commented 7 years ago

Certain page offsets for the search endpoint in the registry v1 API return a 500 response. I work for a company called Blackduck Software where part of my job is to catalog metadata on open source software. I noticed this issue when trying to crawl the v1 API for all repositories currently in Docker's hub. I understand that the v1 API has been deprecated, but the V2 API does not allow this through the /_catalog endpoint.

Steps to reproduce the issue:

  1. GET /v1/search?q=+&n=100&page=10

I believe the issue results from an anomaly in the data as you will get the same 500 error by adjusting the page size and page offset. For example these all result in the same 500 error and should be returning the same dataset:

  1. GET /v1/search?q=+&n=100&page=10
  2. GET /v1/search?q=+&n=50&page=20
  3. GET /v1/search?q=+&n=25&page=40

Describe the results you received:

curl -v https://index.docker.io/v1/search\?q\=+\&n\=100\&page\=10
*   Trying 52.87.68.213...
* Connected to index.docker.io (52.87.68.213) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate: *.docker.io
* Server certificate: RapidSSL SHA256 CA - G3
* Server certificate: GeoTrust Global CA
> GET /v1/search?q=+&n=100&page=10 HTTP/1.1
> Host: index.docker.io
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 500 INTERNAL SERVER ERROR
< Server: nginx/1.6.2
< Date: Wed, 25 Jan 2017 18:46:29 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Vary: Cookie
< X-Frame-Options: SAMEORIGIN
< Strict-Transport-Security: max-age=31536000

Describe the results you expected:

{
"num_pages": 13795,
"num_results": 1379464,
"results": [...],
"page_size": 100,
"query": " ",
"page": "10"
}