elgonio / TK8-thing

34 stars 7 forks source link

Feedback on latest analysis #2

Open MengLeZhang opened 7 months ago

MengLeZhang commented 7 months ago

Great project and much needed. Here's my thoughts on the analysis writeup (if helpful):

  1. Rank distribution over time Rank dist has moved between Feb and 1st March. Did the previous analysis count rank from matches instead of highest player rank attained. Higher ranked players play more matches and can therefore inflate the rank distribution.
  2. Outliers Probably should avoid reporting individual outlier cases (e.g. person playing XX matches a day) -- just to stay on the good side of Bamco/ community (e.g. might feel this work is an invasion of privacy esp since their steam ids are also in the data -- then Bamco will respond if they detect any hint of risk). This is just from personal experience of how things might play out.
  3. Confidence intervals Can you explain a bit more about what went wrong/ the goal? Given the extremely large sample size -- I reckon the CI are going to be tiny even for subgroup analyses so my instincts are that it's overkill.
mgiles717 commented 7 months ago

Great project, happy to lend a helping hand if possible just thought I'd tag onto this issue.

I think it will be very hard to account for variance in matches from specific players, and with high rank players playing more games, in relation to the population distribution. A great example of this is likely shown with Rangchu playing Kuma/Panda a significant amount more and resulting in a boost in bears player ranks. I believe this shows the importance of the two graphs for Character play rates across all skill levels and Character play rates at purple ranks and above. Whilst I understand your point for outliers I believe they represent a relevant part of the sample, however that's not to say that they can't be standardized to see if it alters the outcome.

MengLeZhang commented 7 months ago

@mgiles717 nice to see someone else also interested in where this goes.

RE: outliers. Just to clarify my thoughts, I don't suggest omitting them as part of data processing. It's to do with commenting on individual cases in things like Reddit posts. Aggregated data reporting is alright but people may feel like it's an invasion of privacy if you go too much into detail on individual cases in public. Or to put it in a Tekken way -- the risk of offending someone is low but I won't advise taking the unnecessary mixup :)

materi commented 6 months ago

I've been experimenting with my own data collecting. Mostly passive, but did some longer active tests this past weekend. I'm probably not going to do much more with this anytime soon so I'll just share a dataset that should contain close to all of the ranked matches in a 24h period, covering Saturday March 23 UTC timezone. If you guys want to make some analytics.

gzipped CSV, UTF-8, 1854706 rows

https://cdn.t8ranked.win/battles-export-24h-20240323.csv.gz

elgonio commented 6 months ago

It's really great to see people being so interested in this project. The feedback is quite welcome. I haven't checked my github in a while so this is a pleasant surprise.

RE: outliers: my hope is that with enough data they become somewhat irrelevant and thus no action is needed.

RE: Rank distribution over time. you bring up a valid point. The first post did count rank from matches but I hope to have everything standardized in future to unique player counts. What I'm still thinking about is how we count secondary characters if at all. Currently only a players highest ranked character is counted but many players have secondaries at lower ranks. Do they count towards the count at all? Perhaps it's worth investigating as a separate issue. Maybe a chart of most popular secondary characters.

RE: Confidence intervals. I agree that they might be overkill but the goal here would be forward looking. The goal in future is to look at very high ranks like Tekken God + and I imagine gathering enough data might be a challenge. So the confidence would be a form of insurance to check that enough data has been gathered for our analysis to be valid.

elgonio commented 6 months ago

@materi Thanks for sharing the data. I really appreciate it.

edgybrowser3 commented 2 months ago

First I apologize as I don't know how to ask or request a Feature, the correct way, never used Git but will definitely from now : ) . And You are doing great work, So the request is can you please check if its possible to change amount of second which one can control a player inside replay by using cheat engine. Hope some cheat engine users can tell or explain , how to find the value that needed to be change in order to change the second limit inside t8 replay which is by default is 10,