dijs / wiki

Wikipedia Interface for Node.js
MIT License
315 stars 61 forks source link

Fetch Error #136

Closed stevewirig closed 3 years ago

stevewirig commented 4 years ago

I am unable to figure out what is causing this error. It seems to happen if I have more than one or two page calls:

Unhandled error { FetchError: invalid json response body at https://en.wikipedia.org/w/api.php?format=json&action=query&redirects=&prop=coordinates&titles=Lone%20Peak%20High%20School&origin=* reason: Unexpected token < in JSON at position 0
>      at /Users/swirig/www/roam-platform/node_modules/node-fetch/lib/index.js:272:32
>      at process._tickCallback (internal/process/next_tick.js:68:7)
>    message:
>     'invalid json response body at https://en.wikipedia.org/w/api.php?format=json&action=query&redirects=&prop=coordinates&titles=Lone%20Peak%20High%20School&origin=* reason: Unexpected token < in JSON at position 0',
>    type: 'invalid-json' }

Here is my function:

const getWikiContent = functions.https.onCall(async(data, context) => {
const wikPage = await wiki().page('Eiffel Tower');
const results = await Promise.all([
    wikPage.mainImage(),
    wikPage.content(),
    wikPage.summary(),
    wikPage.coordinates()
]);
...

Is this a bug? Am I handling the promises wrong? Is there an issue with running more than one request in a row like this? I have tried multiple variations of the above. If I comment out coordinates it will error with the mainImage call... It seems to work sometimes with only two calls, but I can't seem to figure out what is causing the issue in the first place. Thanks in advance for your help.

nixxquality commented 4 years ago

If you hack up the original code you'll see that this is the message returned from Wikipedia:

<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
.content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645ad; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: sans-serif; }
.text-muted { color: #777; }
</style>
<div class="content" role="main">
<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">
</a>
<h1>Error</h1>
<div class="content-text">
<p>Our servers are currently under maintenance or experiencing a technical problem.

Please <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few&nbsp;minutes.</p>

<p>See the error message at the bottom of this page for more&nbsp;information.</p>
</div>
</div>
<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request from ---- via cp3054 frontend, Varnish XID 52776425<br>Upstream caches: cp3054 int<br>Error: 429, Scripted requests from your IP have been blocked, please contact noc@wikimedia.org, and see also https://meta.wikimedia.org/wiki/User-Agent_policy at Thu, 30 Jan 2020 08:43:05 GMT</code></p>
</div>
</html>

To fix it, you'll have to follow their User-Agent policy. The way to implement this in code is to replace the call to wiki() with:

wiki({ headers: { 'User-Agent': 'my-script-name (https://my-script-link; my@email) wiki.js' } })

Perhaps the library should find 429 errors and return a useful response?

Also, an easier way to supply user-agent would be appreciated.

stevewirig commented 4 years ago

That fixed the issue for me. Thank you! Also, agree with adding a better handler/response for Wiki Errors.

liamrathke commented 4 years ago

I can confirm that this fixed the issue for me as well. It seems like Wikipedia only began enforcing the User-Agent policy within the last few months, as my project was previously working with no issues without including the header at all.

In the short-term, we might want to update the README.md with this information, so that users won't have to dig up this thread.

LucaFranceschini commented 3 years ago

Is the TypeScript Options missing 'User-Agent' property? When I try the fix above in TypeScript I get the following:

error TS2559: Type '{ headers: { 'User-Agent': string; }; }' has no properties in common with type 'Options'.
nixxquality commented 3 years ago

Yes, see #143.

dijs commented 3 years ago

Fixed with #140