ckatzorke / howlongtobeat

A simple api for https://howlongtobeat.com/
Do What The F*ck You Want To Public License
338 stars 45 forks source link

htmlscraper needs additional headers #17

Closed sogehige closed 3 years ago

sogehige commented 3 years ago

There needs to be added additional headers into htmlscraper.ts. Without it I got 403 response.

  headers: {
    Referer: 'https://howlongtobeat.com/',
    'Content-type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
  },

I would do a PR but not sure how dist and src is done in your project

ChronicLynx commented 3 years ago

This fix does indeed remedy the issue. I would propose:

request.post(url, { qs: { page: 1 }, form: { 'queryString': query, 't': 'games', 'sorthead': 'popular', 'sortd': 'Normal Order', 'plat': '', 'length_type': 'main', 'length_min': '', 'length_max': '', 'detail': '0' }, headers: { 'Content-type': 'application/x-www-form-urlencoded', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36' }

Upon testing I've discovered you do not need the Referer set in the header. As Referer is supposed to contain the address of the page making the request, it could be a bit misleading by including it with the howlongtobeat domain name.

ckatzorke commented 3 years ago

Hey, thank you. Yes, the referrer is not necessary, and true, it would be awkward to set it to howlongtobeat.com

I added a random user-agents, the package is available as v1.3.1. All integration-tests have successfully passed, so it should work now