alexsherman / leeg

2 stars 0 forks source link

Reqs: Implement AllMatchesReqService and HybridReqService #24

Closed dmcfalls closed 5 years ago

dmcfalls commented 5 years ago

Reqs currently only supports reqs from a single summoner's match history. Implement two new structs that implement the ReqService trait:

Once these are in place, we can play around with extending our models, weightings, etc. for all the variants

alexsherman commented 5 years ago

what do you envision for the all matches one? a full matrix of champs vs champs or just a vector? bc while we have materialized views for global champ winrates over all, we don't have one for champ vs champ winrates. If you want that, we'd have to make a view, since even at our 40k game database it would be unviable to do it in reqs. (not because of rust, just because of getting that data out of postgres)

dmcfalls commented 5 years ago

Haven't thought through the infrastructure, but you raise a good point. I envisioned the all-matches one providing more nuanced winrates (maybe using 3d or 4d matrices). In that case, it would in fact require getting a dump of the database and doing some intense number crunching.

Don't have much of a handle on our postgres limitations. But a couple potential ways of dealing with that: 1) Pull matches in match-batches, and incrementally process. 2) Have a separate app to pull data from postgres and convert into (a) json file(s). Once we grab a match and convert it once, we don't have to touch DB again.

alexsherman commented 5 years ago

Let's keep thinkin of it. I think probably best thing would be to have some process incrementally pull in batches and then update a different database table. keeping track of what has been taken into account could be challenging depending on how we implement it.

in terms of limitations - just think dl-ing 40MB but it's not like static files, gotta get outta the db. 1000matches ~~ 1MB is our very basic math rn. we can do something like just get a cursor to all matches and go one at a time or something. record latest time stamp that its processed. but that's gonna take a lot of thinking.

it would be hella cool to be able to do some sick front end viz of a >2d matrix though. and useful for model. we're still aways out from that though

On Wed, May 15, 2019 at 10:42 PM Dan McFalls notifications@github.com wrote:

Haven't thought through the infrastructure, but you raise a good point. I envisioned the all-matches one providing more nuances winrates (maybe using 3d or 4d matrices). In that case, it would require getting a dump of the database and doing some intense number crunching.

Don't have much of a handle on our postgres limitations. But a couple potential ways of dealing with that:

  1. Pull matches in match-batches, and incrementally process.
  2. Have a separate app to pull data from postgres and convert into (a) json file(s). Once we grab a match and convert it once, we don't have to touch DB again.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexsherman/leeg/issues/24?email_source=notifications&email_token=ABIBJPIDXF5DRBJMNBCDR6DPVTCZTA5CNFSM4HNIDO42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVQPUSQ#issuecomment-492894794, or mute the thread https://github.com/notifications/unsubscribe-auth/ABIBJPMYOFDTAETPOCG26M3PVTCZTANCNFSM4HNIDO4Q .

alexsherman commented 5 years ago

idk if right issue but i do want to explore, in the 1-dimensional realm, factorign in the champ-champ winrates of unpicked but potential champs on each side. could weight potential picks by their pick rate, which we have in a mat view already

alexsherman commented 5 years ago

Okay I partially rescind statement about difficulty of getting out of postgres. It only took a few seconds for the following to run:

\copy (select * from all_matches ORDER BY play_date) to 'test.csv' with csv;

I don't know how this scales past 50k rows, but that's not too bad. It's not something we could query from the db each time as part of the API, obviously - it'll have to be some service that calls this and ingests it every so often with a "where play_date >= last_play_date_processed". Could be a once a day task.

I actually think the bigger conceptual challenge is how we store and query the N>2-dimensional matrix, given that it can't be computed on the fly all the time. a quick google search of "efficiently store and query multidimensional matrix" shows that lots of people are considering this. Guess it all depends what you're thinking tho - if you want to protoype something, just do that \copy command to dump everything. there are other options for csv format defaults you can look up, too.

alexsherman commented 5 years ago

Been thinkin a lot about this - the number of potential matchups is staggering. i was just thinking about the number of 2v2 combinations, could we do a partially-populated matrix of the winrate of any duo vs any duo. There are (144!/(143!2!)) = 10296 unique duos. I reckon the worst case is the real case just for duos - we will encounter a match in which every champ is played alongside or against every other champ. Then, given any duo, there are (142!/(140!2!)) = 10011 other duos that could play against them. 10011 * 10296 / 2 ~ 51 million unique 2v2 matchups. It's likely that not all of these will fill in, but if you collect games.. shit grows real fast. Forget the time required to even calculate it, I'm more just like - how do we store this for reasonable time lookup. Is my math wrong? do u see differently?

alexsherman commented 5 years ago

On one hand, theres no reason we couldnt have 1ms lookup with a table with 50 million (or less rows due to sparcity).given the limited info need to store, maybe doable.

I mean maybe I'm overthinking this and underestimating postgres with the righy indices on the table we have. Maybw just doing SELECT on champs in blue team or champs in read team the main db is fast enough. somethin to test

alexsherman commented 5 years ago

See new pr