RiotGames / developer-relations

Riot Games Developer Ecosystem Bug Reporting
http://developer.riotgames.com
736 stars 44 forks source link

[Feature Request] [Valorant] Enabling Machine Learning by saving game event history as binary data. #312

Open HourGlss opened 4 years ago

HourGlss commented 4 years ago

Problem:

For the eSports community to truly thrive, it needs to be given every opportunity to use data to it's advantage. One of the best ways to do this is to use ML/AI in order to "gain a competitive edge" in strategy or meta gaming. However, performing data science with replay data only is notoriously difficult to do. Replay's take time to ingest, and the format usually needs to be reverse engineered. If the replay format is not available, it needs to be manually entered, which is costly and involves a human, making it prone to have errors.

Solution:

Save the entire game's events in a file format that is published. Make it verifiable by Riot, but otherwise limit the use of an API. ( I understand that the API for Valorant may be close to being done, as it is being published soon. I don't know what it is available though, so I am choosing to act as if it won't exist ) The goal here would be to send players the binary file only if they want it. Only the players that are in a match could be sent the file concerning that match.

It should be explicitly noted that all the user API stats goodness is still available. Win rate per map, accuracy with a gun, accuracy with skills. The client could use all of this as well. It would be amazing if the game had targeted “help sections.” For example: “We noticed your aim is significantly lower when defending versus attacking, you may benefit from this type of training.”

If you allowed players to “Watch” that, you could then show improvement. People love feeling like they are getting better.

Details of solutions:

Ideally, the binary file format would require minimal upkeep in the future on Riot's behalf and give insight into the game that was recently played without causing an unnecessary strain on Riot's servers.

In order to keep file size small I propose every .5 seconds or 1 second record the following:

That's 111 bits. Lets make it nice and round and go with 128 bits per player per second. This gives us room for future expansion stuff. 10 players = 1280 bits for statusing per second. Assuming a 40 minute game: 3072000 bits or 384kb for status for the match.

Further work:

  1. Where are the smokes / walls / Cypher trip lines etc? I believe those also can be saved in a small format that show up as events. Should be relatively small as well. To accomplish this, we tag each non-bullet projectile with "what it is", and "where is it", "is it active", "Who owns it"... going through all the cases will take some time, but it only needs to be done once. Then you define that format.
  2. We also need to track the spike. location, and if its held or activated or dropped, countdown, exploded.
  3. Who hit who? Where were they hit? I think this can be saved relatively easily as well as an event. I would imagine that 7 shots per death would be near the upper bound. Even if you tracked hit location with 4 bits this should not inflate the file size too greatly.
  4. At the top of the binary file I would include some form of data so that the replay could be verified. Possibly the sha256 of the game's unique identifier. This would allow traceback on Riot's side. It would prevent data tampering by players as well.
  5. At the top of the file I would also include a header concerning who was in the game
  6. At the end of the file I would include a results section with some basic information

Value added to players:

The players and organizations that wish to explore and develop this data will be rewarded by learning about the conditions that lead to success, either in the round or overall throughout the game. It enables organizations to do large scale data analysis, which progresses eSports and the game's balance. It encourages third party tool development and community involvement. Data aggregation is now a community driven issue. While I think that Riot should have the ultimate decision regarding game balance, you won't know what the community can come up with until you give them the proper resources to accomplish great things.

When should this be available to players?

If all of this information is given to the players at the end of each round, that's not good. People will parse the information by round and then gain insight into the other team in slightly delayed time. Therefore the server is going to need to have to save this data per match and then only send the information to the players that request it. Because most players won't care about this, you can throw it into the bottom of the settings menu and only send that information to the players that want it. I believe that this level of detail allows players to use quantitative data to improve while also not giving other people enough information to DIRECTLY mimic a pro player. Therefore some strategies will remain secret. By storing it in binary format Riot is able to save space on the server and in bandwidth. It encourages the creation of third party tools and websites while making it easy for Riot to maintain. All they would need to do is publish the "This is the bit format we use to save the data."

Economic Incentives:

A lack of Valorant API means that Riot doesn't need to maintain the infrastructure for an API. Riot doesn't need to approve or revoke authorizations to an API. Riot doesn't need to store the data on their servers other than match id and who was in the match, which I'm sure they already do. Riot also can use the same information for the same machine learning. They can also see which third parties develop useful applications and hire new data scientists that have good ideas and present them well.

Thank you very much for reading.

Querijn commented 4 years ago

That's a very interesting post! Something that I'd like to see out of this is the amount of time you'd expect the developers to work on this. This would allow Riot to perfectly plan their developer plan to make sure it's in your hands as soon as possible.

Also, small sidenote: State doesn't need to be saved per frame, just state changes.

HourGlss commented 4 years ago

State changes wouldn't work very well, unless you considered movement through space as changes in state. And then if you did it by frame the file size would be much larger, I think that there may need to be certain state changes that are taken into account, but everything else can be done on the .5 second or 1 second interval.

For the developers, it would actually mean quite a bit of work in the beginning. I could do some of it by defining the exact scheme and figure out how to bitpack data making reasonable assumptions concerning the capability and fidelity of the Unreal engine.

The scheme is the most important piece though. And what events to record, and how often. Then you would need to create an opcode with the different types of events. You don't actually need to bitpack very hard, but it makes things faster if its a single bit rather that {"variable":True} if you see what I mean.

I think the biggest challenge would be iterating through every single type of class entity and analyzing it's various states and then figuring out how to best save the state in as little bits as possible.

However, iteratively, this is easy to maintain long term. Developers would only need to change it when the possible states of the entities change. Some hypotheticals: The damage of a gun changes - No need to update anything. The number of charges on a skill changes: You use one of the reserved bits for a class. You specify the change, that's it. Probably 1 or two lines of code total

jjmaldonis commented 4 years ago

One important thing to realize is that for other games, and I doubt Valorant will be different, Riot doesn't want players to be able to go to a website that characterizes the playstyle of their opponents at such a detailed level that the behavior of individual players can be predicted. That would give a competitive advantage to players using the website, over players who do not, beyond what any player could know if they had an incredibly deep understanding of the game itself. There is a very big difference between insights into the game vs. insights into a specific player - the former is encouraged while the latter is not. And because of this competitive integrity idea, the data about specific players is likely to be very limited.

OAuth technology can help players allow/deny such insight into their playstyle, and OAuth is on it's way, but it isn't ready quite yet for everyone to use.

HourGlss commented 4 years ago

I don’t disagree that privacy could be a concern. Anonymizing the other players would work? Then only one person would share their data at a time?

jjmaldonis commented 4 years ago

Yeah OAuth is the technology that will allow for that to happen.

adrianlee commented 4 years ago

I suspect that the data you are looking for would be available in the form of a demo replay file that's recorded at some fraction of the game's 128 tick rate. As for whether demos are publicly available and whether event data can be parsed without a signed client is the question.

HourGlss commented 4 years ago

Once again, parsing demo replay files are terrible. Most replay controls for each player and set the random seed the same as it was during the match. This way, you get an exact copy of the match. Any data analysis done at this point is then at the whim of how much information can be gleaned and then saved off by a replay.

RiotTuxedo commented 4 years ago

Hey all, just want to jump in to add some context.

  1. We're aware of the use-case that many developers would like supported related to detailed information about events that happen in-game. We've had a number of requests that the data be as granular as each server tick. At this time, I wouldn't expect that level of granularity, but we are working towards having data about games available through the API. More info on that soon(tm).

  2. As for the conversation in this particular issue, it's strayed a little bit from the goal of this particular channel. We're really focused on hearing what kind of features developers would like to see supported and specifically the use-cases that require those features. Your original request includes a problem space (lack of detailed match info) and an opinionated solution about how you believe the best way to approach this problem would be. I'd caution against prescribing any particular solution. The engineers closest to the problem space will likely make decisions on any solutions they'd want to build/own. Instead, it'd be better to focus the discussion on what you believe should be requirements and why (the use-cases).

Querijn commented 4 years ago

Some discussion regarding your second point was made in off-topic, I think we got some clarity down for that.

If taken in consideration, some discussion on post-mortem with the team behind the Rectangle endpoint for Legends of Runeterra is encouraged. It's a good endpoint, but it lacks the usefulness to make anything real out of it.

HourGlss commented 4 years ago

We've had a number of requests that the data be as granular as each server tick

Actually this is counter productive. Too much data is often just as bad as not enough data. Especially when the goal is ML. I haven't graphed a PCAP yet, but I'm assuming you either run 30 or 60 tick. 60 hz means a set of data every .016 seconds, that's 30 times the amount of data I'm asking for.

We're really focused on hearing what kind of features developers would like to see supported and specifically the use-cases that require those features.

As an eSports organization, I'd like to be able to do my own analysis on data generated from matches in order to learn from the meta and generate quantatatively supported strategies. For me, anonymized data is fine as long as I can control the rank it came from. As a user I'd like to be able to compare my results against others, and control where that data goes. As a user I'd like to be able to learn "fun statistics" about my gameplay. How many times have I hit with grenades? On what map and position do I spend most of my time?

Your original request includes a problem space (lack of detailed match info) and an opinionated solution about how you believe the best way to approach this problem would be. I'd caution against prescribing any particular solution.

Lack of detailed match info is one piece of it. A lack of an easy way to digest the data by ML is another. Giving me a replay file of the game would allow me to glean the information (mostly) but would be completely useless. I'm not trying to be prescriptive, I'm trying to perform low-level engineering for free. It's very easy to develop solutions that give too much information (a "snap" of data every tick) or too little information (some statistics at the end of the round), or data that needs to be parsed or otherwise handled (a replay file).

This is a paradigm shift from the other games you currently support with APIs. I understand why you'd be defensive. As much as I'd like to think that Riot makes decisions based on a lack of user stories, I think the truth might be closer to "Riot makes decisions based on user choices that are economically viable and can be expected to generate more interest in Riot's ecosystem of games." Pieces of this post were meant to directly address the economic concerns of maintaining an API versus not maintaining an API. Some pieces were meant to show a robust ecosystem of content can be possible without an API.

Thank you for the reply.