DrFriendless / ExtendedStats

Boardgamegeek Extended Stats. Web site written in Python using MySQL and Django.
32 stars 9 forks source link

timeout on huge collection #9

Closed sandorkazi closed 6 years ago

sandorkazi commented 6 years ago

Only an issue for my profile I guess but the stats page times out. I don't know where is the exact demarcation (and would really like to check the stats) hence I submit this as an issue. Everything is on my wishlist which I don't own or preordered and this causes a huge collection.

https://stats.drfriendless.com/dynamic/tabbed/Masu

Chaning the generation of the stats page to be daily (or weekly) instead of on-demand would solve this is issue, but would require data stored on your side. Or is there an API instead of the page?

AFAIK you don't event use the wishlist, so maybe a filtering would solve the problem on my part.

Thanks.

DrFriendless commented 6 years ago

Your wishlist is ridiculous. I'm not going to change my code just because you're being silly.

sandorkazi commented 6 years ago

I have my reasons... :) But I understand your response.

sandorkazi commented 6 years ago

Another go, and I try not to be similarly arrogant... so... if you read carefully: "I don't know where is the exact demarcation (...) hence I submit this as an issue."

If the timeout is hit somewhere near 5 thousand, others will have this issue as well... if it is over 10 thousand, then, whatever... Although I'd love have the functionality that's not why I submitted... Don't change the code because I'm being silly... I didn't even want to ask you that. I could fork the repo if I really want to have the stats...

DrFriendless commented 6 years ago

You're more than welcome to fork the code. I'm not actively developing in this repo any more. I'm rewriting the site to the version which is in ExtendedStatsServerless. Your name came up in that version too. I'm using AWS Lambda to run the downloader code, and the Lambda that I can afford to allocate to the collection download cannot download your collection before it times out.

As you obviously have ideas, even if they don't work with mine, you're welcome to add me on Facebook: https://www.facebook.com/Friendless.Farrell where I will invite you to the Extended Stats Advisory Council where we can chat about your ideas.

sandorkazi commented 6 years ago

I don't know Angular or nodeJS, but I always have ideas. :) I do Python most of the time...


I have two ideas to resolve this:

First idea:

Over a limit (5000?) of collection size do the following:

  1. instead of collecting the collection info on-demand change to a scheduled download
    • you can optionally include a "donate and request" button or something if you're willing to provide on-demand for some money or something... I don't know if there's a business model here; didn't check, didn't care...
  2. instead of getting the whole collection, download only parts of it within a lambda (same limit will (obviously) suffice)
    • with the XML API it is quite obvious if someone has changed his (large) collection while downloading (you'll get a specific response stating that the export is being generated)
    • if you don't use an API like this, a sorting and pagination can work, but there have to be an ordering of collection items, and one have to watch for changes in the collection while downloading...

Can you do this with the lambdas?


Second idea

You can also try to download the CSV version of the collection (for larger ones): https://boardgamegeek.com/geekcollection.php?action=exportcsv&subtype=boardgame&username={}&all=1&exporttype=csv Sometimes this will result in a CSV containing the whole collection, other times you will get the answer

Your request for this collection has been accepted and will be processed.


Sidenote: you should still expect timeouts...

My collection export is generated in about 10 minutes or so... problems occur when I do large modifications to my collection and I'm (or someone else is) trying to download the collection at the same time... Something is obviously (?) BADly written within the export mechanism on the BGG side (or the code of someone else who was trying to download my collection)... or just WAS, I don't dare to check again...

See the following scenario:


Some reasoning behind the ridiculous wishlist size:

I'm a data scientist... yes, I know this is not an explanation... so:

  1. I ended up checking the same game the third time more than once...
  2. When I realized this the second time, I felt lost in the database... I don't like to feel this with data...
  3. I started to add them to my collection without a status...
  4. But I could not make distinction between the following four by just putting them in my collection or not
    • games I haven't checked
    • games I checked and not interested in
    • games I checked and could not decide on
  5. I started to use the wishlist category "don't buy this"... which was much more meaningful as well...
    • games I haven't checked --> not in my collection
    • games I checked and not interested in --> wishlist(5) - Don't buy this
    • games I checked and could not decide on --> wishlist(4) - Thinking about it
  6. I started to put games in my collection again... because of two reasons:
    • it was easier to manage the not categorized items on the collection page as there are filters and views
    • I got interested in when is a new game uploaded to the database...
  7. so I ended up adding everything...
    • which is also good for other purposes: when I download a CSV export, every game from the database is in there, so it's easier to find the game I'm looking for... if the query is complex...

If you have any ideas how not to have a large wishlist but have the same functionality, I'm all ears. :)