Add multi-arena episodes

benaslater commented 3 months ago

Adds the feature of multi-arena episodes, by looking for the mergeNextArena parameter in an arena's yaml. When a good (not timeout or death) episode end is reached TrainingAgent checks TrainingArena to see if it should end the episode or merge the next arena. If the latter it calls the new TrainingArena function loadNextArena.

Discarded Alternative: Just call ResetArena() since it does basically the same thing. This would be more complicated to implement in the situation where arenas are being randomly cycled. Also I think the semantics of ResetArena in AAI are already a bit complicated and I didn't want to further the problem.
Discarded alternative: Add a completeArena function to TrainingArena with a boolean for whether the episode was successful or not. Benefit: The TrainingAgent doesn't have to care about whether to call EndEpisode or rotate the arenas Drawback: Would have to pass control of showing notifications to TrainingArena, as TrainingAgent doesn't know whether the completion of an arena means a notification should be shown (episode end) or not (arena has been merged)

Proposed change(s)

Adds multi arena episode support

Additional changes:

Merge the functions TryLoadArenaConfiguration, ApplyNewArenaConfiguration, CleanupRewards, NotifyArenaChange into one function (having three separate functions when they're always called in sequence increases the chance one is missed out)
If the arena is randomised, use the ChooseRandomArenaID function on the first call (I think we can probably get rid of that code branch somehow but I wasn't brave enough to do it this time)
Rename getMaxArenaID to getArenaCount (This function returns the count. We expect arenas to be indexed from 0, so would expect the max arena ID to be the count-1)
Rename DetermineNextArenaID to SetNextArenaID to emphasise the function's side effect

Useful links (Github issues, ML-Agents forum threads etc.)

Ticket: https://www.notion.so/Multi-Arena-Episodes-7edc1ee064034193967f46bc12c75913?pm=c
Documentation change: https://github.com/Kinds-of-Intelligence-CFI/animal-ai/pull/51

Types of change(s)

[ ] Bug fix
[X] New feature
[ ] Code refactor
[ ] Breaking change
[ ] Documentation update
[ ] Other (please describe)

Checklist

[To be added in new package] Added tests that prove my fix is effective or that my feature works
[Will do in update to https://github.com/Kinds-of-Intelligence-CFI/animal-ai/pull/51] Updated the changelog (if applicable)
[X] Updated the documentation (if applicable)
[ ] Updated the migration guide (if applicable)

Testing

Tests of a merged arena + failure cases in E2E tests (I'll push these to a new repo)
Manually test error case of putting mergeNextArena in the final arena (The error does get swallowed though - https://www.notion.so/Don-t-swallow-Unity-errors-9c1c84c1c51c4165b0d5d46e3d764c7c?pvs=4)
Manually test exclusion of merged arenas when randomiseArenas is set to true (Add a debug log in TrainingArena.cs that prints out validArenaIDs and confirm one is excluded)
Train stablebaselines PPO on the arena from the E2E test and observe it learns without hitting an error

alhasacademy96 commented 3 months ago

all in all a solid approach. But i think a few changes are to be made and some code to be reverted for effectiveness.

benaslater commented 3 months ago

Sorry I'm a bit of a novice on github PRs - I've pushed a new commit to the feature branch with the changes discussed in the comments. Do I close this PR and open a new one on that?

ref: https://github.com/Kinds-of-Intelligence-CFI/animal-ai-unity/commit/5050eb30f2f60b66126c1c1ffe494d80540e5662

alhasacademy96 commented 3 months ago

Sorry I'm a bit of a novice on github PRs - I've pushed a new commit to the feature branch with the changes discussed in the comments. Do I close this PR and open a new one on that?

ref: 5050eb3

No need. simply see the commits you made after the comments are reflected on this PR. You can also see the "OUTDATED" marker for the scripts you made changes to which are now overwritten by the newer commits you made recently.

Edit: WHEN you think you're ready for a re-PR, simply click the recycling button next to my name at the top right so I know you're ready :). This button simply asks for a review again.

alhasacademy96 commented 3 months ago

however, I do think that the important change is still not made, specifically keeping the modularity of TrainingArena.cs, where because its such an important script, I think the best way to go at it is to keep the modular structure I set for my previous PR. Your methods and modifications will still be valid.

It's important to keep the mainResetArena()method as readable and organized as possible, whilst incorporating your additions.

@benaslater

notion-workspace[bot] commented 3 months ago

Multi-Arena Episodes

notion-workspace[bot] commented 3 months ago

Multi-Arena Episodes

benaslater commented 3 months ago

I think the best way to go at it is to keep the modular structure I set for my previous PR.

I don't think I agree. Moving a big block of code into a function with a clear interface can be useful for code clarity, but I view breaking that code into several functions that must be called in sequence as a bit of an antipattern where we need to do it multiple times.

i.e. if I wanted to achieve A, and that involved doing A1, then A2, then A3. I could do it by defining A (where A1, A2 and A3 are clearly commented in the definition of A), or I could do it by defining A1, A2 and A3 separately. The issue is if I do it in the latter way now every time I want to do A it takes me 3 function calls instead of 1, and I don't seem to get much benefit over clearly marking the distinct stages of A in the body of my A function with comments. The case where it might be attractive to do it anyway is if A1, A2, A3 are useful elsewhere, but even then I would be tempted to define A() = A1(); A2(); A3();

In this case A is UpdateActiveArenaToCurrentArenaID: given that the current arena ID has been updated, we want to do all the steps needed to achieve that. We need to do that in ResetArena() and LoadNextArena().

alhasacademy96 commented 3 months ago

Hey Ben, here are some points I'd like you to consider in general and not in necessarily specific for your PR only.

-Maintainability + readability: Breaking down a large function like UpdateActiveArenaToCurrentArenaID (renamed) into smaller, logically separated functions (A1, A2, A3) can significantly enhance readability and maintainability. Each function can be focused on a single responsibility, making the code easier to understand at a glance. When functions are smaller and well-named, new developers or others unfamiliar with the code can more easily understand the workflow and purpose of the code. We need to think about future developers and collaborators in this regard for AAI.

-Reusability: Although in your scenario Ben A1, A2, and A3 might primarily support the process A, encapsulating these steps into separate functions can unexpectedly benefit future developments. For instance, if a new feature or requirement emerges that only needs the functionality of A2, it can be reused without duplicating code or introducing unnecessary dependencies.

-Tests: Smaller, well-defined functions are easier to test (recall what i mentioned earlier). When functions are broken down into smaller parts, it's easier to write tests for each part, ensuring that they work correctly in isolation. This can lead to a more robust and reliable codebase, as each part can be individually verifiied..

-Error Handling and debugging: It can be easier to handle errors and debug smaller functions. When an error occurs, if the code is split into smaller parts, it's often quicker to isolate the problem. Also, you can handle errors specific to the part of the process each function represents, which might be more difficult in a larger, monolithic functio

-Flexibility in refactoring/extending: Future changes to the process can be more manageable when the code is modular. For instance, if you later need to add a new step A4 between A2 and A3, doing so is straightforward without disturbing the other parts of the code. i think this can be more complex and error-prone in a single large function where changes affect the entire function's flow (which was the case before my PR).

-Design principles (important for any SD): breaking down functions follows well-established software design principles like SRP SoC. These principles advocate for structuring software so that each element of the code has a single responsibility and is independent of the other parts, which enhances modularity and scalability, which you already know I assume :) but a reminder.

So the point im trtying to make is that it's imperative that we understand these points and implement them where we can. Again, these points do not only apply to your PR, but in general for the codebase. I would have to kindly insist on this matter Ben.

benaslater commented 3 months ago

Hey Ibrahim, thank you for these general points. I don't agree that factoring into functions in this way is an improvement from a software engineering perspective, but it's not likely to have a big impact either way so I've pushed a change returning the functions.

alhasacademy96 commented 3 months ago

Thanks you can merge when you're ready

Kinds-of-Intelligence-CFI / animal-ai-unity