ScrappyCocco / HowLongToBeat-PythonAPI

A simple Python API to read data from howlongtobeat
https://pypi.org/project/howlongtobeatpy/
MIT License
84 stars 5 forks source link

All searches returning None (all requests returning 404) #14

Closed spurll closed 2 years ago

spurll commented 2 years ago

I was hoping to integrate this into Goodplays, unfortunately I haven't been able to get it working. Every search returns None, apparently due to a 404.

For example:

>>> from howlongtobeatpy import HowLongToBeat
>>> HowLongToBeat(0.0).search("Outer Wilds")

returns None. When I modify HTMLRequests.py to spit out the response text on non-200 responses I get a 404 page:

<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><title>HowLongToBeat.com | Game Lengths, Backlogs and more!</title><meta name="theme-color" content="#000000"/><meta name="description" content="How long are your favorite video games? HowLongToBeat has the answer. Create a backlog, submit your game times and compete with your friends!"/><meta name="robots" content="noodp, noydir"/><meta name="thumbnail" content="https://howlongtobeat.com/img/hltb_brand2.png"/><link rel="canonical" href="https://howlongtobeat.comundefined"/><meta property="twitter:url" content="https://howlongtobeat.comundefined"/><meta property="og:url" content="https://howlongtobeat.comundefined"/><meta property="og:title" content="HowLongToBeat.com | Game Lengths, Backlogs and more!"/><meta property="og:type" content="website"/><meta property="og:image" content="https://howlongtobeat.com/img/hltb_brand2.png"/><meta property="og:description" content="How long are your favorite video games? HowLongToBeat has the answer. Create a backlog, submit your game times and compete with your friends!"/><meta name="twitter:card" content="summary"/><meta name="twitter:description" content="How long are your favorite video games? HowLongToBeat has the answer. Create a backlog, submit your game times and compete with your friends!"/><meta property="twitter:domain" content="howlongtobeat.com"/><meta name="twitter:site" content="@HowLongToBeat"/><meta name="twitter:image" content="https://howlongtobeat.com/img/hltb_brand2.png"/><meta name="next-head-count" content="19"/><link rel="apple-touch-icon" sizes="57x57" href="https://howlongtobeat.com/img/icons/apple-touch-icon-57x57.png"/><link rel="apple-touch-icon" sizes="60x60" href="https://howlongtobeat.com/img/icons/apple-touch-icon-60x60.png"/><link rel="apple-touch-icon" sizes="72x72" href="https://howlongtobeat.com/img/icons/apple-touch-icon-72x72.png"/><link rel="apple-touch-icon" sizes="76x76" href="https://howlongtobeat.com/img/icons/apple-touch-icon-76x76.png"/><link rel="apple-touch-icon" sizes="114x114" href="https://howlongtobeat.com/img/icons/apple-touch-icon-114x114.png"/><link rel="apple-touch-icon" sizes="120x120" href="https://howlongtobeat.com/img/icons/apple-touch-icon-120x120.png"/><link rel="apple-touch-icon" sizes="144x144" href="https://howlongtobeat.com/img/icons/apple-touch-icon-144x144.png"/><link rel="apple-touch-icon" sizes="152x152" href="https://howlongtobeat.com/img/icons/apple-touch-icon-152x152.png"/><link rel="apple-touch-icon" sizes="180x180" href="https://howlongtobeat.com/img/icons/apple-touch-icon-180x180.png"/><link rel="icon" type="image/png" href="https://howlongtobeat.com/img/icons/favicon-32x32.png" sizes="32x32"/><link rel="icon" type="image/png" href="https://howlongtobeat.com/img/icons/android-chrome-192x192.png" sizes="192x192"/><link rel="icon" type="image/png" href="https://howlongtobeat.com/img/icons/favicon-96x96.png" sizes="96x96"/><link rel="icon" type="image/png" href="https://howlongtobeat.com/img/icons/favicon-16x16.png" sizes="16x16"/><link rel="manifest" href="/manifest.json"/><link rel="preconnect" href="https://howlongtobeat.com"/><link rel="preload" as="script" href="https://cdn.ziffstatic.com/pg/howlongtobeat.js"/><script type="text/javascript" id="pogo" src="https://cdn.ziffstatic.com/pg/howlongtobeat.js" async=""></script><link rel="stylesheet" href="https://cdn.ziffstatic.com/pg/howlongtobeat.css"/><link rel="preload" as="script" href="https://cdn.ziffstatic.com/pg/howlongtobeat.prebid.js"/><script crossorigin="true" src="https://cdn.ziffstatic.com/jst/zdconsent.js" async=""></script><link rel="preload" href="/_next/static/css/228c4b801d7021a9.css" as="style"/><link rel="stylesheet" href="/_next/static/css/228c4b801d7021a9.css" data-n-g=""/><noscript data-n-css=""></noscript><script defer="" nomodule="" src="/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js"></script><script src="/_next/static/chunks/webpack-2efae08fba7c9b69.js" defer=""></script><script src="/_next/static/chunks/framework-d51ece3d757c7ed2.js" defer=""></script><script src="/_next/static/chunks/main-c908647eeec19f62.js" defer=""></script><script src="/_next/static/chunks/pages/_app-c2db2970f545cae6.js" defer=""></script><script src="/_next/static/chunks/pages/404-c06b07ed3ab0773c.js" defer=""></script><script src="/_next/static/JCeqWkOAftOy_JRL-i2FJ/_buildManifest.js" defer=""></script><script src="/_next/static/JCeqWkOAftOy_JRL-i2FJ/_ssgManifest.js" defer=""></script></head><body><div id="__next"><script>!function(){try{var d=document.documentElement,n='data-theme',s='setAttribute';var e=localStorage.getItem('theme');if('system'===e||(!e&&true)){var t='(prefers-color-scheme: dark)',m=window.matchMedia(t);if(m.media!==t||m.matches){d.style.colorScheme = 'dark';d[s](n,'dark')}else{d.style.colorScheme = 'light';d[s](n,'light')}}else if(e){d[s](n,e|| '')}if(e==='light'||e==='dark')d.style.colorScheme=e}catch(e){}}()</script><div class="Layout_container__V2eEE"><header class="MainNavigation_header__WuiTa"><nav class="MainNavigation_nav__LkHHd"><a class="MainNavigation_brand__8YjKY" aria-label="HowLongToBeat" href="/"></a><ul class="MainNavigation_list__xBZrm"><li><a href="/forum">Forum</a></li><li><a href="/stats">Stats</a></li><li><a href="/submit">Submit</a></li></ul><ul class="MainNavigation_login__KE7zX"><li><a class="text_primary" href="/login">Login</a></li><li class="MainNavigation_join_link__4bsgx"><a class="mobile_hide text_primary" href="/login/signup">Join</a></li></ul><div class="MainNavigation_search__kw6St"><input class="MainNavigation_search_box__jDUWW back_form" aria-label="Search" tabindex="2" type="text" placeholder="Search Your Favorite Games..." autoComplete="off"/></div></nav></header><main class="Layout_main__NgJgX"><div class="back_dark" id="global_site" style="display:block;border-top:1px solid transparent"><div class="contain_out"><div class="contain_in index_padding"><div class="content_100 center"><div class="global_padding_big"><h1 class="global_padding"><span class="mobile_hide">Error</span> 404 - Not Found</h1></div></div><div class="content_100 center"><img src="https://howlongtobeat.com/img/404/pong.gif" style="width:100%" alt="404"/></div><div class="content_100 center"><p class="in">Sorry! The page you are looking for does not exist. Try going back or visiting a different link.</p></div></div></div></div></main><footer class="Footer_footer__2MMdT back_primary"><div class="Footer_footer_inside__UfjFE"><div class="Footer_footer_links__GA8Gc"><h3>HowLongToBeat</h3><ul><li><a href="/feedback">Contact Us</a></li><li><a href="/conduct">Code of Conduct</a></li><li><a href="/privacy">Privacy Policy</a></li></ul></div><div class="Footer_footer_links__GA8Gc"><h3>Social</h3><ul><li><a href="https://discord.gg/v5F26Dk" target="_blank" rel="noreferrer">Discord</a></li><li><a href="https://facebook.com/HowLongToBeat/" target="_blank" rel="noreferrer">Facebook</a></li><li><a href="https://twitter.com/HowLongToBeat/" target="_blank" rel="noreferrer">Twitter</a></li></ul></div><div class="Footer_footer_copyright__TDPg5"><h3>© 2022 Ziff Davis, LLC</h3><ul><li>Powered By Community, Built With Love</li><li><a href="#" class="showConsentTool"><img alt="AdChoices Icon" src="https://c.evidon.com/pub/icong1.png" class="evidon-consent-link-image" style="vertical-align:middle" width="14" height="18"/> <!-- -->AdChoices</a></li><li></li></ul></div></div></footer></div></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{"pageMetadata":{"noTopAd":true}},"__N_SSG":true},"page":"/404","query":{},"buildId":"JCeqWkOAftOy_JRL-i2FJ","isFallback":false,"gsp":true,"scriptLoader":[]}</script></body></html>

(I replicated this by manually posting against the URL.)

From what I can tell, it looks like they've changed the posting endpoint to https://www.howlongtobeat.com/api/search, reformatted it slightly, and are actually returning JSON now, which should make parsing it a hell of a lot easier. I haven't been following this closely, but perhaps this change corresponds to the GamePass integration they've just implemented.

Unfortunately, I'm unable to get it to actually authorize (everything I post to the endpoint gives me a 401). For example:

>>> import requests
>>> requests.post('https://www.howlongtobeat.com/api/search', json={'searchType': 'games', 'searchTerms': ['OUTER', 'WILDS'], 'searchPage': 1, 'size': 20}, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36 OPR/90.0.4480.54', 'content-type': 'application/json', 'origin': 'https://howlongtobeat.com', 'referrer': 'https://howlongtobeat.com/', 'accept': '*/*'})
<Response [401]>

Python version:

Python 3.6.10 (default, Dec 19 2019, 23:04:32)
[GCC 5.4.0 20160609] on linux
ScrappyCocco commented 2 years ago

The test seems to confirm it https://github.com/ScrappyCocco/HowLongToBeat-PythonAPI/actions/runs/3061152147 I am busy this weekend but will try to look into it next week

ScrappyCocco commented 2 years ago

Related to https://github.com/ckatzorke/howlongtobeat/issues/40

ScrappyCocco commented 2 years ago

@Kononkov1998 @kparal @spurll so it would be ok to completely redo the parsing and then have all of this in the result struct?

This would require more end-user management about what values to pick, but the API would just expose all the values HLTB gives us, so it's up to the api-user what values to pick and how to convert them (time is in second but it would be up to the user how to convert and how to approximate it)

This would also make this API code much easier as I just parse the JSON to put the values in the struct, so it would be easy to add/remove or edit a new/removed value; while keep track of labels, conversions and such could become a problem as different users can have different needs

{
  "count": 9,
  "game_id": 10270,
  "game_name": "The Witcher 3: Wild Hunt",
  "game_name_date": 0,
  "game_alias": "",
  "game_type": "game",
  "game_image": "10270_The_Witcher_3_Wild_Hunt.jpg",
  "comp_lvl_combine": 0,
  "comp_lvl_sp": 1,
  "comp_lvl_co": 0,
  "comp_lvl_mp": 0,
  "comp_lvl_spd": 1,
  "comp_main": 184379,
  "comp_plus": 371160,
  "comp_100": 622976,
  "comp_all": 367171,
  "comp_main_count": 2133,
  "comp_plus_count": 5187,
  "comp_100_count": 1780,
  "comp_all_count": 9100,
  "invested_co": 0,
  "invested_mp": 0,
  "invested_co_count": 0,
  "invested_mp_count": 0,
  "count_comp": 15547,
  "count_speedrun": 16,
  "count_backlog": 15296,
  "count_review": 3983,
  "review_score": 94,
  "count_playing": 320,
  "count_retired": 797,
  "profile_dev": "CD Projekt RED",
  "profile_popular": 1284,
  "profile_steam": 292030,
  "profile_platform": "Nintendo Switch, PC, PlayStation 4, Xbox One",
  "release_world": 2015
  }
kparal commented 2 years ago

It's great that HLTB now returns a json. Exposing just the returned json is fine, of course. It would be nice to provide at least a short documentation, though, when HLTB provides none. For a newcomer, it's hard to figure out which fields contain what data.

I also think that HLTB could provide an extra value by pre-crunching the most commonly-used values and exposing them in addition to the full json. Also keep computing the similarity (also considering game_alias, now that we have access to it). It would make life easier for consumers. So HowLongToBeatEntry could stay and look something like this:

class HowLongToBeatEntry:
    def __init__(self):
        # Base game details
        self.game_id = -1
        self.game_name = None
        self.game_type = None
        self.release_world = None
        # Gameplay times in hours
        self.gameplay_main = -1
        self.gameplay_main_extra = -1
        self.gameplay_completionist = -1
        self.gameplay_all = -1
        self.gameplay_coop = -1
        self.gameplay_competitive = -1
        # Extra
        self.similarity = -1
        # Full original HLTB response
        self.full = { ... }

Or the other way around, you can extend the json with some specific things from the API:

{
  "count": 9,
  "game_id": 10270,
  "game_name": "The Witcher 3: Wild Hunt",
  ...
  "_hltb-python": {
    "similarity": 0.7,
    ...
  }
}

But if you don't want to deal with that, exposing just the original json is fine as well.

ScrappyCocco commented 2 years ago

It's great that HLTB now returns a json. Exposing just the returned json is fine, of course.

@kparal I did not mean to expose just the plain json as it is, I was thinking more about parsing it and putting all of its values in the HowLongToBeatEntry, documenting the most evident ones with comments; but leaving them as they are, so for example leaving the time untouched in seconds as it is in the json.

The final user would still use HowLongToBeatEntry, but it would be up to them to check if they need to read "invested_co" or "comp_main" for example, and decide how to convert and display the time.

kparal commented 2 years ago

Whatever you come up with, it would be nice to have access to the full original response (converted to a dict, full = json.loads(json_str)), I agree. And I'd personally welcome memorable fields (invested_co is not too memorable) with time converted into hours, which is I believe what most people would use as well. But I can surely do it myself in my own code, if needed, so that's just a suggestion :slightly_smiling_face:

ScrappyCocco commented 2 years ago

Got it, I'll start to work on it today and will update this Issue as soon as I have a working example

spurll commented 2 years ago

Sounds wonderful to me. Thanks folks.

ScrappyCocco commented 2 years ago

Hi guys, sorry for being a little slow, I am a little busy

You can see the new version I am making here https://github.com/ScrappyCocco/HowLongToBeat-PythonAPI/tree/developjson

And so the new entry here https://github.com/ScrappyCocco/HowLongToBeat-PythonAPI/blob/developjson/howlongtobeatpy/howlongtobeatpy/HowLongToBeatEntry.py

Unless you find something you would like to discuss, something don't like or would like to change something, I am probably going to finish rewriting the tests and also rewriting the Github Actions to work better and test both the local code beside the released one

It's gonna take a while to finish everything as this week I will be pretty busy neear the weekend and also I want to check all the comments and correct the README where needed, I hope you're not in a hurry, I promise to finish this asap

ScrappyCocco commented 2 years ago

Also if a field (for example "game_alias") is empty, should HowLongToBeatEntry copy that and still have an empty string or should it have None so it's easier to recognize?

kparal commented 2 years ago

And so the new entry here https://github.com/ScrappyCocco/HowLongToBeat-PythonAPI/blob/developjson/howlongtobeatpy/howlongtobeatpy/HowLongToBeatEntry.py

Looks good. Some remarks:

   # The type of entry, usually "game" or "dlc"

It would be useful to list out all possible types, if you know it.

    # Similarity with original name, is the max similarity with game_name and game_alias

This is a bit confusing, I'd change it to "Similarity with the searched string ...".

Also if a field (for example "game_alias") is empty, should HowLongToBeatEntry copy that and still have an empty string or should it have None so it's easier to recognize?

In Python, it's customary to use if entry.field: and so it doesn't really matter I think. Whatever feels better to you.

Thanks for your work!

ScrappyCocco commented 2 years ago

@kparal

It would be useful to list out all possible types, if you know it.

I wish, but without a proper api documentation is hard to know all the possibilities

This is a bit confusing, I'd change it

Will do

In Python, it's customary to use if entry.field

So does that return false both for None and for len==0?

kparal commented 2 years ago

So does that return false both for None and for len==0?

Of course :-)

$ python
>>> print(bool(None), bool(""), bool("hello"))
False False True

>>> x = None or "" or "hello"
>>> print(x)
hello
ScrappyCocco commented 2 years ago

Will update this issue when the new release is live

ScrappyCocco commented 2 years ago

Version 1.0.1 is now available!

https://github.com/ScrappyCocco/HowLongToBeat-PythonAPI/releases/tag/1.0.1

ScrappyCocco commented 2 years ago

For any problem feel free to open an issue, a discussion or even a pull request! The code is easier to edit now that is all JSON

Thank you for using this API

spurll commented 2 years ago

Thanks!

kparal commented 2 years ago

I just updated my project, it works fine. Thanks for a fast new release.