fabienvauchelles / scrapoxy

Scrapoxy is a super proxy aggregator, allowing you to manage all proxies in one place 🎯, rather than spreading it across multiple scrapers πŸ•ΈοΈ. It also smartly handles traffic routing πŸ”€ to minimize bans and increase success rates πŸš€.
http://scrapoxy.io
MIT License
2.05k stars 237 forks source link

API Returns HTML #215

Closed matt-gorman closed 10 months ago

matt-gorman commented 10 months ago

Current Behavior

I was attempting to use the API, but I believe I'm missing something. I grabbed username/password from the Project Settings and tried to use Python Requests to use the API:

import requests

basic = requests.auth.HTTPBasicAuth('<username from Project Settings>','<password from Project Settings>')
res = requests.get('http://<hostname>:8890/api/scrapers/project', auth=basic)

However I get HTML back

<!doctype html>
<html lang="en">
    <head>
        <meta charset="utf-8"/>
        <base href="/"/>
        <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"/>
        <meta name="description" content="Scrapoxy"/>
        <meta name="author" content="Fabien Vauchelles"/>
        <meta name="keyword" content="scrapoxy,crawl,crawling,proxy,phantomjs,scraper,scraping,scrapy,selenium,webscraper,webscraping"/>

        <link rel="shortcut icon" href="assets/imgs/scrapoxy-small.svg"/>

        <title>Scrapoxy</title>
    <link rel="stylesheet" href="styles-VUVQ2C4R.css"><link rel="modulepreload" href="chunk-ESESERKI.js"></head>

    <body>
        <noscript>You need to enable JavaScript to run this app.</noscript>

        <div class="loader" style="text-align: center; padding-top: calc(100vh / 2); height: 100vh">
            <i class="spinner-grow"></i>
            <span>Loading...</span>
        </div>
    <script src="polyfills-Q6FK7RZU.js" type="module"></script><script src="main-YDLUVJ4O.js" type="module"></script></body>
</html>

API felt like it was missing a project or I wasn't hitting the right service, so there were at least two other things I tried:

  1. Adding the ID:
http://<hostname>:8890/api/scrapers/project?id=<ID string from GUI URL>
http://<hostname>:8890/api/scrapers/project/<ID string from GUI URL>

Brought back the Same HTML

  1. Wrong port in base URL, so tried 8888 (was pretty sure this wasn't it):
http://<hostname>:8888/api/scrapers/project

{"id":"wrong_url","message":"URL has no hostname","method":"GET","url":"/api/scrapers/project"}

Am I missing something with setting up and using the API?

Expected Behavior

Expected to get JSON back similar to what is in the API docs.

Steps to Reproduce

  1. Use the Python Requests module to request using the username/password from the Project Settings. Example:
import requests

basic = requests.auth.HTTPBasicAuth('<username from Project Settings>','<password from Project Settings>')
res = requests.get('http://<hostname>:8890/api/scrapers/project', auth=basic)

Failure Logs

No response

Scrapoxy Version

4.2.3

Custom Version

Deployment

Operating System

Storage

Additional Information

No response

fabienvauchelles commented 10 months ago

Hi @matt-gorman ,

Thanks a lot for your feedback.

There was a typo on the website. API URL is http://<hostname>:8888/api/scraper instead of http://<hostname>:8888/api/scrapers (there is no plural at scraper).

Documentation is now corrected.

Thanks for spotting the issue!

matt-gorman commented 10 months ago

Looks good, thanks for the quick response.