VedalAI / neuro-amongus

Among Us Plugin for Neuro-sama
GNU General Public License v3.0
540 stars 49 forks source link

Recording fields discussion #23

Closed oleg20111511 closed 1 year ago

oleg20111511 commented 1 year ago

So the idea behind recording player data is that this data will be fed to a ML algorithm, and we have to record both the input (environment info) and output (resulting action) values. It is important to figure out exactly what data we need beforehand, so I'm creating this issue as a hub for discussing it.

We currently save:

The things I think need change now are:

Those are just the things I noticed, there could be more stuff that needs to be addressed

EBro912 commented 1 year ago

In my PR, I have your conclusion to venting implemented, which is quite literally just randomly going to a vent on the map and checking if there's anyone nearby before leaving the vent. There is a NearbyVents variable in the game attached to each vent, but this seems to break randomly so randomly moving to a vent is the current, and easiest, solution. I think that your idea for just temporarily taking control from ML after it decides to vent is a good one. (Currently mine just beelines for a vent after a kill but that can be easily changed)

As for sabotage, most impostors I've played with/as just click a random sabotage as soon as they can. Obviously there is some strategy to sabotages but again the best/quickest solution might just be to allow either ML or the code itself just randomly pick a sabotage when available (maybe add a delay so it isn't obvious). Doors could be implemented similarly, but there is possible logic for considering the number of players in a room and their location on the map to further decide a kill and/or vent opportunity.

In short, I think that all of your points are valid, and that we should try to leave as much as we can up to ML when it comes to impostor. However, venting and sabotage might need some assistance from pure logic in order to make it feasible.

Alexejhero commented 1 year ago

There is a NearbyVents variable in the game attached to each vent, but this seems to break randomly so randomly moving to a vent is the current, and easiest, solution.

Nothing is random. If you let me know what the problem is I can probably debug it for you.

However, venting and sabotage might need some assistance from pure logic in order to make it feasible.

I don't really agree with this.

EBro912 commented 1 year ago

Nothing is random. If you let me know what the problem is I can probably debug it for you.

In my testing I've used NearbyVents and after a while the game seems to just forget what the nearby vents are. Could very well be my implementation of it but you are correct in that it is not randomness.

I don't really agree with this.

Could you elaborate? Again I know next to nothing about the intricacies of neural networking, however if the problems oleg described are true then it seems the easiest solution would be hardcoded assistance, unless it would be possible for it to know which vents to take and when to hop out of one through decision making.

Alexejhero commented 1 year ago

NearbyVents is implemented like this:

    private Vent[] NearbyVents
    {
        get
        {
            return new Vent[] { this.Right, this.Left, this.Center };
        }
    }

It is a runtime getter which returns the 3 connected vents which are unity serialized fields. There can be no randomness unless this getter was patched inadvertently (for example if it was deduplicated)

EBro912 commented 1 year ago

NearbyVents is implemented like this ...

I believe now know why my implementation was not working, however I don't want to derail this issue further. Thanks for pointing me in the right direction.

oleg20111511 commented 1 year ago

However, venting and sabotage might need some assistance from pure logic in order to make it feasible.

I don't really agree with this.

I too disagree that sabotage & doors need assistance from code, just adding the fields I mentioned will make it work. But making navigation within vent through code could save us a lot of time though. As for how exactly to code it, running for vent after kill all the time is not a good thing to do, that's for sure.

What I think good idea would be: if neural network goes into vent, we make it circulate through all the vents in the set, and let the neural network decide when to get out. It will be fed information about information surrounding the vent while it is still inside, so should work perfectly Pros: this strat is what a lot of players use anyway to gather info, so it'll look natural Cons: it can't really compare the situations around each vent, so it will exit the first moment it sees that it's ok to do so

EBro912 commented 1 year ago

What I think good idea would be: if neural network goes into vent, we make it circulate through all the vents in the set, and let the neural network decide when to get out.

After your explanation I do agree with the both of you that code assistance would not be the best way to go. As for your point on it leaving right when it is able to, I don't think it would be that big of a deal if it did. It would indeed be hard to compare between vents since the information at previous vents can change at any time, but I think that leaving the vents as quickly as possible to secure an alibi is just as strong of a strategy as chilling in the vents and ensuring the coast is clear.

Morgul commented 1 year ago

What I think good idea would be: if neural network goes into vent, we make it circulate through all the vents in the set, and let the neural network decide when to get out. It will be fed information about information surrounding the vent while it is still inside, so should work perfectly Pros: this strat is what a lot of players use anyway to gather info, so it'll look natural Cons: it can't really compare the situations around each vent, so it will exit the first moment it sees that it's ok to do so

So, one of the beautiful things about reinforcement learning is we can actually let the ML try to work out the optimal strategy here. I think it's important to focus on how to feed the ML all the information it needs to come up with its own strategy for using vents. I would think knowing the location of the nearest vent, if the player is in a vent, and when they use the vent should be, in general, sufficient.

Now, I'm not saying the ML will get it right; it might be really bad at using vents. But as long as we have the reinforcement right, it'll learn quickly. That's kinda the whole point.

As for sabotage, most impostors I've played with/as just click a random sabotage as soon as they can. Obviously there is some strategy to sabotages but again the best/quickest solution might just be to allow either ML or the code itself just randomly pick a sabotage when available (maybe add a delay so it isn't obvious). Doors could be implemented similarly, but there is possible logic for considering the number of players in a room and their location on the map to further decide a kill and/or vent opportunity.

As for sabotage and doors, I think we need to (on the input side) record when a sabotage is done, and which one.

Same for the doors. Since we already are recording what players we can see, all we need to add is something to indicate the player closed/opened a door, (and maybe if we're near the button to open/close a door) and that should be it.

TL;DR I think all we need is:

I'm reasonably sure that should cover everything.

EBro912 commented 1 year ago

TL;DR I think all we need is:

  • NearbyVents: Location of all Vents in X radius
  • InVent: If the player is in a vent
  • NearbyDoors: Location of all Doors in X radius
  • DoorUsed: If the player activated the door
  • SabotageUsed : Number representing a given sabotage. (Needs to be an index into a master list of all possible sabotages so the index we use for, say, 'Deplete Oxygen' is always the same.)

I'm reasonably sure that should cover everything.

If it makes things easier I can shift my PR over to recording that data, since a lot of the functionality I use to test things already collects that exact data. If we collectively decide proper values for X and the sabotages I can add it to the PR and mark it for review. Should get us well on our way to your suggestion. Luckily for us, there already exists a SystemTypes enum which has entries for both rooms and sabotages, but splitting up the data into our own types might be good for readability purposes.

EBro912 commented 1 year ago
  • NearbyDoors: Location of all Doors in X radius
  • DoorUsed: If the player activated the door

About these two, since we sabotage multiple doors at once, do we want to record a list of doors shut (i.e. DoorsUsed) instead? We could also record the room that we closed the doors in and keep track of nearby rooms instead of doors (i.e. RoomUsed and NearbyRooms).

Also, on maps outside of Skeld, there are special doors such as doors that must be interacted with manually, are automatic, etc. I assume we are only concerned with doors that we can sabotage but I just wanted to make sure that these don't get overlooked.

Vedal987 commented 1 year ago

We can record data like which sabotage, which vents and which doors were used specifically but for now I just intend to simplify the problem as much as possible for the AI. This means using some simple logic when the AI says it wants to vent/sabotage/door to just estimate a good course of action. Then if we want to give the AI more control later, as long as we have the data recorded we can retrain it with different inputs/outputs.

Vedal987 commented 1 year ago

So, one of the beautiful things about reinforcement learning is we can actually let the ML try to work out the optimal strategy here.

Should be noted that we're currently not planning to use reinforcement learning (or at least not initially, maybe for fine tuning). The current plan is to distribute this plugin that records data and then just train the network on that.