bdefore / protondb-data

Data exports from ProtonDB.com released under ODbL
135 stars 7 forks source link

Question about your ratings system #2

Closed ekianjo closed 5 years ago

ekianjo commented 5 years ago

Not sure how to reach you directly, so this comment is going there.

Look at this: https://www.protondb.com/app/50620

Everyone but one person reports this as "borked" and one platinium reports brings the rating to "bronze"? It does not make any sense. I'd suggest you think about devising more appropriate classification algorithms.

bdefore commented 5 years ago

sorry for the delay in responding @ekianjo ... cases like the one you link here show that the current weighting system is subpar. in this case hewing to the median would be much more accurate.

there is work underway to renovate the rating system in the branch for the new reporting flow.

lorenzos commented 5 years ago

So, how the current rating works @bdefore ? Median? I'm interested to use the open data for filtering games in my Steam Tools, and I'd like the rating to be the same as the overall rating on your website. Thanks.

bdefore commented 5 years ago

greetings @lorenzos ... i've used your steam tools for years now to decide what to play. its filtering system is still the best i know of for exploring a library. thank you for making it!

ProtonDB does provide an API that might be of use for you, specifically to provide protondb data per appId. I explain it somewhat here but would be happy to answer further: https://github.com/tryton-vanmeer/ProtonDB-for-Steam/issues/5#issuecomment-431371059

using this for your tools would allow you to stay in sync with ProtonDB's ratings. currently this is derived from an algorithm that mostly corresponds to a mean. upcoming changes will bring this weighting more in line with a median or mode, depending on quantity of reports.

lorenzos commented 5 years ago

Hi @bdefore. Great to see there's an API that already includes ProtonDB website's tiers. Two things:

1) Which of the tiers is the one you show on the website? tier, provisionalTier, trendingTier? Or maybe you have a rule to use one if another is missing or something like that?

2) A per-app web request is not great in my case, because when a user will load a big library (and it's more common than one might think) I'd need to fire hundreds and often thousands of web requests to your server. Now I see two possibilities:

a) Add multi-app request to that API, for example passing IDs with POST data, or with a long comma-separated GET string. This could also have a limit, like 100 app per-request, and I'll managed multiple chunks myself client-side.

b) Use the single-app API as it is in my database update process, which occurs twice a week. I can use current API, but expect a burst of requests twice a week, in the order of ~75k requests in a ~1.5h time span.

c) A combination of a) and b), in which I use chunked multi-app requests only twice a week. Probably that's the best solution, compromising a bit on data freshness for greatly reduced server load on your end (less than a thousand requests in ~1.5h).

Obviously if none of these solutions are ok, I'll find a way to use the open database dumps here on this repo.

bdefore commented 5 years ago

@lorenzos first, i'll define what these properties map to:

tier: what is displayed on protondb for the rating provisionalTier: for games that have not yet met a threshold quantity of valid ratings, what the tier would be with the data currently available. this is presented on protondb as clock icon and a slightly faded color. can be ignored when tier is present. trendingTier: what the rating would be given protondb's algorithm if only considering recent versions of proton. if this differs from tier it is indicated by an arrow icon pointing up or down.

thanks for laying out your strategies. i agree that this makes integrations like these inefficient/cumbersome. currently multiple appids are not supported, but is something i'm considering. it would likely improve responsiveness for your tool, but not necessarily reduce total bandwidth demand, and could even exacerbate demand as consumers would be likely to want to overfetch.

i'm currently on a free tier with netlify for hosting, which occasionally exceeds their caps at which point i make optimizations. at this point i'm reaching the limit of obvious improvements in that realm. fortunately, they do a remarkable job of caching common requests, but i've received unclear answers on whether warm cache hits increment their metering.

for the moment, it sounds like some variation of 2b is what i would advise, perhaps with a check to skip if this database dump does not indicate any new reports for that id. i suspect that if your check is limiting properly, i'd estimate that we'd be looking at on the order of 1k lookups (the amount of unique ids that receive reports biweekly).

let me know when it's in use and i'll monitor traffic.

lorenzos commented 5 years ago

@bdefore Thanks for the properties definition. I didn't understand how the database dump in this repo can be used to know if APi fetch is required; isn't it updated monthly?

PS: gone quite OT here, in case you prefer to continue the discussion privately you'll find my email at bottom of Steam Tools; if not, I'm good to continue here.

bdefore commented 5 years ago

It is OT. I'll reach out to you by email.