API Returns HTML - Githubissues

matt-gorman commented 10 months ago

Current Behavior

I was attempting to use the API, but I believe I'm missing something. I grabbed username/password from the Project Settings and tried to use Python Requests to use the API:

import requests

basic = requests.auth.HTTPBasicAuth('<username from Project Settings>','<password from Project Settings>')
res = requests.get('http://<hostname>:8890/api/scrapers/project', auth=basic)

However I get HTML back

<!doctype html>
<html lang="en">
    <head>
        <meta charset="utf-8"/>
        <base href="/"/>
        <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"/>
        <meta name="description" content="Scrapoxy"/>
        <meta name="author" content="Fabien Vauchelles"/>
        <meta name="keyword" content="scrapoxy,crawl,crawling,proxy,phantomjs,scraper,scraping,scrapy,selenium,webscraper,webscraping"/>

        <link rel="shortcut icon" href="assets/imgs/scrapoxy-small.svg"/>

        <title>Scrapoxy</title>
    <link rel="stylesheet" href="styles-VUVQ2C4R.css"><link rel="modulepreload" href="chunk-ESESERKI.js"></head>

    <body>
        <noscript>You need to enable JavaScript to run this app.</noscript>

        <div class="loader" style="text-align: center; padding-top: calc(100vh / 2); height: 100vh">
            <i class="spinner-grow"></i>
            <span>Loading...</span>
        </div>
    <script src="polyfills-Q6FK7RZU.js" type="module"></script><script src="main-YDLUVJ4O.js" type="module"></script></body>
</html>

API felt like it was missing a project or I wasn't hitting the right service, so there were at least two other things I tried:

Adding the ID:

http://<hostname>:8890/api/scrapers/project?id=<ID string from GUI URL>
http://<hostname>:8890/api/scrapers/project/<ID string from GUI URL>

Brought back the Same HTML

Wrong port in base URL, so tried 8888 (was pretty sure this wasn't it):

http://<hostname>:8888/api/scrapers/project

{"id":"wrong_url","message":"URL has no hostname","method":"GET","url":"/api/scrapers/project"}

Am I missing something with setting up and using the API?

Expected Behavior

Expected to get JSON back similar to what is in the API docs.

Steps to Reproduce

Use the Python Requests module to request using the username/password from the Project Settings. Example:

import requests

basic = requests.auth.HTTPBasicAuth('<username from Project Settings>','<password from Project Settings>')
res = requests.get('http://<hostname>:8890/api/scrapers/project', auth=basic)

Failure Logs

No response

Scrapoxy Version

4.2.3

Custom Version

[X] No
[ ] Yes

Deployment

[X] Docker
[ ] Docker Compose
[ ] Kubernetes
[ ] NPM
[ ] Other (Specify in Additional Information)

Operating System

[X] Linux
[ ] Windows
[ ] macOS
[ ] Other (Specify in Additional Information)

Storage

[X] File (default)
[ ] MongoDB & RabbitMQ
[ ] Other (Specify in Additional Information)

Additional Information

No response

fabienvauchelles commented 10 months ago

Hi @matt-gorman ,

Thanks a lot for your feedback.

There was a typo on the website. API URL is http://<hostname>:8888/api/scraper instead of http://<hostname>:8888/api/scrapers (there is no plural at scraper).

Documentation is now corrected.

Thanks for spotting the issue!