Recording fields discussion

oleg20111511 commented 1 year ago

So the idea behind recording player data is that this data will be fed to a ML algorithm, and we have to record both the input (environment info) and output (resulting action) values. It is important to figure out exactly what data we need beforehand, so I'm creating this issue as a hub for discussing it.

We currently save:

Values for input: Is Imposter Kill Cooldown Direction to nearest task Whether an emergency task is active Direction to nearest vent Direction to nearest body Whether a body can be reported Direction and position of nearby players
Values for output: Movement direction (last saved, meaning Neuro won't be able to stay in one place) Whether should report Whether should vent Whether should kill Whether should sabotage Whether should close doors

The things I think need change now are:

Input should also contain direction to the emergency task and whether can do sabotage
If we use bool for sabotage and only let ML decide when to do it, then we need to code something to choose which sabotage to do. If we want ML to decide which to do, we need to add Map to input, and replace bool with int representing id of the sabotage
For doors, it would make sense to have relative position and whether it can be closed for each door (I think there's no need to store the exact cooldown for each door).
Venting is kinda complicated if we want it to be handled purely by ML model: We need InVent info in input. We need movement inside vent in output. Each vent is in different location and a limited amount of other vents are accessible from each vent. Which means we need to know the exact vent we're in, and a unique id of this vent needs to be in input. It also means that, while outside of vent, we need to know not only direction to the nearest vent, but the id of this vent since this also affects the decision. (I don't know how neural networks work, so correct me if I'm wrong on this one) Since everything is happening based on information from a single frame and AI doesn't really have memory, could it be possible that AI would enter the vent just to exit it at the same place without moving, making the action meaningless? And also there could be situations when it exposes itself by venting in a closed room thinking there's no one in sight, but in fact someone was following closely behind and was seen some frames ago CONCLUSION: Venting might be not worth the effort to implement. Though it might work if, after ML model decides to vent, we explicitly program it to jump to a random other vent and then give control back to ML

Those are just the things I noticed, there could be more stuff that needs to be addressed

EBro912 commented 1 year ago

In my PR, I have your conclusion to venting implemented, which is quite literally just randomly going to a vent on the map and checking if there's anyone nearby before leaving the vent. There is a NearbyVents variable in the game attached to each vent, but this seems to break randomly so randomly moving to a vent is the current, and easiest, solution. I think that your idea for just temporarily taking control from ML after it decides to vent is a good one. (Currently mine just beelines for a vent after a kill but that can be easily changed)

As for sabotage, most impostors I've played with/as just click a random sabotage as soon as they can. Obviously there is some strategy to sabotages but again the best/quickest solution might just be to allow either ML or the code itself just randomly pick a sabotage when available (maybe add a delay so it isn't obvious). Doors could be implemented similarly, but there is possible logic for considering the number of players in a room and their location on the map to further decide a kill and/or vent opportunity.

In short, I think that all of your points are valid, and that we should try to leave as much as we can up to ML when it comes to impostor. However, venting and sabotage might need some assistance from pure logic in order to make it feasible.

Alexejhero commented 1 year ago

There is a NearbyVents variable in the game attached to each vent, but this seems to break randomly so randomly moving to a vent is the current, and easiest, solution.

Nothing is random. If you let me know what the problem is I can probably debug it for you.

However, venting and sabotage might need some assistance from pure logic in order to make it feasible.

I don't really agree with this.

EBro912 commented 1 year ago

Nothing is random. If you let me know what the problem is I can probably debug it for you.

In my testing I've used NearbyVents and after a while the game seems to just forget what the nearby vents are. Could very well be my implementation of it but you are correct in that it is not randomness.

I don't really agree with this.

Could you elaborate? Again I know next to nothing about the intricacies of neural networking, however if the problems oleg described are true then it seems the easiest solution would be hardcoded assistance, unless it would be possible for it to know which vents to take and when to hop out of one through decision making.

Alexejhero commented 1 year ago

NearbyVents is implemented like this:

    private Vent[] NearbyVents
    {
        get
        {
            return new Vent[] { this.Right, this.Left, this.Center };
        }
    }

It is a runtime getter which returns the 3 connected vents which are unity serialized fields. There can be no randomness unless this getter was patched inadvertently (for example if it was deduplicated)

EBro912 commented 1 year ago

NearbyVents is implemented like this ...

I believe now know why my implementation was not working, however I don't want to derail this issue further. Thanks for pointing me in the right direction.

oleg20111511 commented 1 year ago

However, venting and sabotage might need some assistance from pure logic in order to make it feasible.

I don't really agree with this.

I too disagree that sabotage & doors need assistance from code, just adding the fields I mentioned will make it work. But making navigation within vent through code could save us a lot of time though. As for how exactly to code it, running for vent after kill all the time is not a good thing to do, that's for sure.

What I think good idea would be: if neural network goes into vent, we make it circulate through all the vents in the set, and let the neural network decide when to get out. It will be fed information about information surrounding the vent while it is still inside, so should work perfectly Pros: this strat is what a lot of players use anyway to gather info, so it'll look natural Cons: it can't really compare the situations around each vent, so it will exit the first moment it sees that it's ok to do so

EBro912 commented 1 year ago

What I think good idea would be: if neural network goes into vent, we make it circulate through all the vents in the set, and let the neural network decide when to get out.

After your explanation I do agree with the both of you that code assistance would not be the best way to go. As for your point on it leaving right when it is able to, I don't think it would be that big of a deal if it did. It would indeed be hard to compare between vents since the information at previous vents can change at any time, but I think that leaving the vents as quickly as possible to secure an alibi is just as strong of a strategy as chilling in the vents and ensuring the coast is clear.

Morgul commented 1 year ago

What I think good idea would be: if neural network goes into vent, we make it circulate through all the vents in the set, and let the neural network decide when to get out. It will be fed information about information surrounding the vent while it is still inside, so should work perfectly Pros: this strat is what a lot of players use anyway to gather info, so it'll look natural Cons: it can't really compare the situations around each vent, so it will exit the first moment it sees that it's ok to do so

So, one of the beautiful things about reinforcement learning is we can actually let the ML try to work out the optimal strategy here. I think it's important to focus on how to feed the ML all the information it needs to come up with its own strategy for using vents. I would think knowing the location of the nearest vent, if the player is in a vent, and when they use the vent should be, in general, sufficient.

Now, I'm not saying the ML will get it right; it might be really bad at using vents. But as long as we have the reinforcement right, it'll learn quickly. That's kinda the whole point.

As for sabotage, most impostors I've played with/as just click a random sabotage as soon as they can. Obviously there is some strategy to sabotages but again the best/quickest solution might just be to allow either ML or the code itself just randomly pick a sabotage when available (maybe add a delay so it isn't obvious). Doors could be implemented similarly, but there is possible logic for considering the number of players in a room and their location on the map to further decide a kill and/or vent opportunity.

As for sabotage and doors, I think we need to (on the input side) record when a sabotage is done, and which one.

Same for the doors. Since we already are recording what players we can see, all we need to add is something to indicate the player closed/opened a door, (and maybe if we're near the button to open/close a door) and that should be it.

TL;DR I think all we need is:

NearbyVents: Location of all Vents in X radius
InVent: If the player is in a vent
NearbyDoors: Location of all Doors in X radius
DoorUsed: If the player activated the door
SabotageUsed : Number representing a given sabotage. (Needs to be an index into a master list of all possible sabotages so the index we use for, say, 'Deplete Oxygen' is always the same.)

I'm reasonably sure that should cover everything.

EBro912 commented 1 year ago

TL;DR I think all we need is:

NearbyVents: Location of all Vents in X radius

InVent: If the player is in a vent

NearbyDoors: Location of all Doors in X radius

DoorUsed: If the player activated the door

SabotageUsed : Number representing a given sabotage. (Needs to be an index into a master list of all possible sabotages so the index we use for, say, 'Deplete Oxygen' is always the same.)

I'm reasonably sure that should cover everything.

If it makes things easier I can shift my PR over to recording that data, since a lot of the functionality I use to test things already collects that exact data. If we collectively decide proper values for X and the sabotages I can add it to the PR and mark it for review. Should get us well on our way to your suggestion. Luckily for us, there already exists a SystemTypes enum which has entries for both rooms and sabotages, but splitting up the data into our own types might be good for readability purposes.

EBro912 commented 1 year ago

NearbyDoors: Location of all Doors in X radius

DoorUsed: If the player activated the door

About these two, since we sabotage multiple doors at once, do we want to record a list of doors shut (i.e. DoorsUsed) instead? We could also record the room that we closed the doors in and keep track of nearby rooms instead of doors (i.e. RoomUsed and NearbyRooms).

Also, on maps outside of Skeld, there are special doors such as doors that must be interacted with manually, are automatic, etc. I assume we are only concerned with doors that we can sabotage but I just wanted to make sure that these don't get overlooked.

Vedal987 commented 1 year ago

We can record data like which sabotage, which vents and which doors were used specifically but for now I just intend to simplify the problem as much as possible for the AI. This means using some simple logic when the AI says it wants to vent/sabotage/door to just estimate a good course of action. Then if we want to give the AI more control later, as long as we have the data recorded we can retrain it with different inputs/outputs.

Vedal987 commented 1 year ago

So, one of the beautiful things about reinforcement learning is we can actually let the ML try to work out the optimal strategy here.

Should be noted that we're currently not planning to use reinforcement learning (or at least not initially, maybe for fine tuning). The current plan is to distribute this plugin that records data and then just train the network on that.

VedalAI / neuro-amongus

Recording fields discussion #23