clearlydefined / service

The service side of clearlydefined.io
MIT License
45 stars 40 forks source link

PyPi Search #159

Open Teju-Manchenella opened 6 years ago

Teju-Manchenella commented 6 years ago

Implement pypi search functionality. Currently in originPyPi.js the route '/:name' attempts to find a package with a name that matches the request parameter 'name'. If a package exists with that name it will be returned.

Instead, this route needs the ability to return all the packages containing the request parameter 'name', not just a single package that is an exact match. From my understanding, PyPi api does not provide this functionality.

jeffmendoza commented 4 years ago

Should see if still an issue. If a limitation of the underlying PyPi service, we should document our limitation in ClearlyDefined's API and close this.

nellshamrell commented 3 years ago

Adding some notes as I investigate this:

Here's the relevant code in OriginPyPi.js

routes/originPyPi.js

router.get(
  '/:name',
  asyncMiddleware(async (request, response) => {
    const { name } = request.params
    const url = `https://pypi.python.org/pypi/${name}/json`
    const answer = await requestPromise({ url, method: 'GET', json: true })
    const result = answer && answer.info ? [{ id: answer.info.name }] : []
    return response.status(200).send(result)
  })
)

Looking at the Pypi search page, there are several packages with nginx in the name (along with one package that has the exact name nginx https://pypi.org/search/?q=nginx

I tried running a curl with the same url that would be used in OriginPyPi.js, but received a 301 in response.

nells@campusnell:~$ curl https://pypi.python.org/pypi/nginx/json
<html><head><title>301 Moved Permanently</title></head><body><center><h1>301 Moved Permanently</h1></center></body></html>
nellshamrell commented 3 years ago

I can get the information about the package when I use

wget https://pypi.python.org/pypi/nginx/json

This gives me information about the exact match package name

{"info":{"author":"tphp","author_email":"336296@qq.com","bugtrack_url":null,"classifiers":[],"description":"time and path tool","description_content_type":"","docs_url":null,"download_url":"","downloads":{"last_day":-1,"last_month":-1,"last_week":-1},"home_page":"https://github.com/tphp/test","keywords":"pip,pathtool,timetool,magetool,mage","license":"MIT Licence","maintainer":"","maintainer_email":"","name":"nginx","package_url":"https://pypi.org/project/nginx/","platform":"any","project_url":"https://pypi.org/project/nginx/","project_urls":{"Homepage":"https://github.com/tphp/test"},"release_url":"https://pypi.org/project/nginx/0.0.1/","requires_dist":null,"requires_python":"","summary":"time and path tool","version":"0.0.1","yanked":false,"yanked_reason":null},"last_serial":7924201,"releases":{"0.0.1":[{"comment_text":"","digests":{"md5":"bc830a301bf3d07cf2e30bb564f3ff11","sha256":"9a52060402cdb9418c41656a553611f5f352d23811c5a4edfa3c9c9772c157a3"},"downloads":-1,"filename":"nginx-0.0.1.tar.gz","has_sig":false,"md5_digest":"bc830a301bf3d07cf2e30bb564f3ff11","packagetype":"sdist","python_version":"source","requires_python":null,"size":1029,"upload_time":"2020-08-10T10:05:29","upload_time_iso_8601":"2020-08-10T10:05:29.246992Z","url":"https://files.pythonhosted.org/packages/3a/61/3acaa548a21b23d6bc1d1a89853c4a4456d9ff0825b57fe6cbdadd0eaf85/nginx-0.0.1.tar.gz","yanked":false,"yanked_reason":null}]},"urls":[{"comment_text":"","digests":{"md5":"bc830a301bf3d07cf2e30bb564f3ff11","sha256":"9a52060402cdb9418c41656a553611f5f352d23811c5a4edfa3c9c9772c157a3"},"downloads":-1,"filename":"nginx-0.0.1.tar.gz","has_sig":false,"md5_digest":"bc830a301bf3d07cf2e30bb564f3ff11","packagetype":"sdist","python_version":"source","requires_python":null,"size":1029,"upload_time":"2020-08-10T10:05:29","upload_time_iso_8601":"2020-08-10T10:05:29.246992Z","url":"https://files.pythonhosted.org/packages/3a/61/3acaa548a21b23d6bc1d1a89853c4a4456d9ff0825b57fe6cbdadd0eaf85/nginx-0.0.1.tar.gz","yanked":false,"yanked_reason":null}]}

But does not give me information about any of the other packages with 'nginx' in their name.

nellshamrell commented 3 years ago

These were the only places I could find semi-recent information about the PyPi API (other than very old stack overflow questions)

This does, indeed, appear to be a limitation of the PyPi API.

@jeffmcaffer @jeffmendoza where would you suggest we document this limitation of the API when it comes to Python packages?

nellshamrell commented 3 years ago

In comparison, looking at OriginRubyGems.js

routes/originRubyGems.js

router.get(
  '/:name',
  asyncMiddleware(async (request, response) => {
    const { name } = request.params
    const url = `https://rubygems.org/api/v1/search.json?query=${name}`
    const answer = await requestPromise({ url, method: 'GET', json: true })
    const result = answer.map(entry => {
      return { id: entry.name }
    })
    return response.status(200).send(result)
  })
)

There are also quite a few Ruby gems with nginx in their name, along with one gem with the exact name nginx https://rubygems.org/search?query=nginx

If I run the API call as defined in the above code:

curl https://rubygems.org/api/v1/search.json?query=nginx

Then I get quite a few gems - all with `nginx` somewhere in their name

```bash
 curl https://rubygems.org/api/v1/search.json?query=nginx
[{"documentation_uri":"https://www.rubydoc.info/gems/nginx/0.0.2","metadata":{},"homepage_uri":"","funding_uri":null,"bug_tracker_uri":null,"project_uri":"https://rubygems.org/gems/nginx","version":"0.0.2","sha":"33b4c47704d802c88891f6f062888a8f90f483f55d51f627bf59aad34ceb1521","platform":"ruby","changelog_uri":null,"source_code_uri":null,"licenses":null,"gem_uri":"https://rubygems.org/gems/nginx-0.0.2.gem","downloads":21293,"mailing_list_uri":null,"name":"nginx","wiki_uri":null,"version_downloads":21285,"info":"Small gem to manage nginx configuration","authors":"Kirill Radzikhovskyy"},{"documentation_uri":"https://www.rubydoc.info/gems/capistrano3-nginx/3.0.4","metadata":{},"homepage_uri":"https://github.com/treenewbee/capistrano3-nginx","funding_uri":null,"bug_tracker_uri":null,"project_uri":"https://rubygems.org/gems/capistrano3-nginx","version":"3.0.4","sha":"e7b1d85494f47f66a9574c44c951e04a61fb956e270bb22d79e89ec10eaed17c","platform":"ruby","changelog_uri":null,"source_code_uri":null,"licenses":["MIT"],"gem_uri":"https://rubygems.org/gems/capistrano3-nginx-3.0.4.gem","downloads":264945,"mailing_list_uri":null,"name":"capistrano3-nginx","wiki_uri":null,"version_downloads":39328,"info":"Adds suuport to nginx for Capistrano 3.x","authors":"Juan Ignacio Donoso, treenewbee"},{"documentation_uri":"https://www.rubydoc.info/gems/nginx_utils/0.1.2","metadata":{},"homepage_uri":"https://github.com/i2bskn/nginx_utils","
(...)
jeffmcaffer commented 3 years ago

Hmmm, this is less than optimal. I'm guessing that this shows up for users when they are looking for a component in the UI? We could show them all the components we already know about. Not sure how to "document" this in a way that the user would see other than putting a little bit of descriptive text somewhere near the search box.

brainwane commented 3 years ago

Is https://github.com/pypa/warehouse/issues/5231 relevant to this issue?