asieradzk / RL_Matrix

Deep Reinforcement Learning in C#
Other
58 stars 2 forks source link

Documentation question #12

Closed Kermalis closed 2 weeks ago

Kermalis commented 2 weeks ago

Hello, I'm currently in the process of porting a (failed) TorchSharp agent over to RL Matrix. Honestly thanks so much for this library. I'm just wondering if you're planning on using the TorchSharp-cuda-windows or TorchSharp-cuda-linux packages eventually? I was using cuda in my previous environment and was hoping it was in the plans

EDIT: I stepped through the debugger and saw that the tensors were already using cuda. I just assumed it didn't since I didn't see it in the README.

I guess while I'm here writing this, I was also wondering if there was an example for a IContinuousEnvironment<>? This is the environment I need since I don't need discrete actions, but the examples were all regular IEnvironment<> from what I saw. Maybe I missed it as well lol

I'm asking since I'm running into this error: image

My env is extremely basic, I am trying to set up 3 inputs and 3 (continuous) outputs with 0 discrete outputs.

private sealed class Env : IContinuousEnvironment<float[]>
{
    public int stepCounter { get; set; }
    public int maxSteps { get; set; }
    public bool isDone { get; set; }
    public OneOf<int, (int, int)> stateSize { get; set; }
    public int[] actionSize { get; set; }
    public (float min, float max)[] continuousActionBounds { get; set; }

    private readonly float[] _inputs;

    public Env()
    {
        _inputs = new float[3];
        Initialise();
    }

    public void Initialise()
    {
        // new physics
        stepCounter = 0;
        maxSteps = 100_000;
        stateSize = 3;
        actionSize = [3];
        continuousActionBounds = [(-1f, 1f), (-1f, 1f), (-1f, 1f)];
        // physics reset
        isDone = false;
    }

    public float[] GetCurrentState()
    {
        _inputs[0] = 1f;
        _inputs[1] = 0.5f;
        _inputs[2] = -1f;
        return _inputs;
    }

    public void Reset()
    {
        // physics reset
        isDone = false;
        stepCounter = 0;
    }

    public float Step(int[] actionsIds)
    {
        throw new NotImplementedException();
    }
    public float Step(int[] discreteActions, float[] continuousActions)
    {
        return continuousActions[0];
    }
}

I also noticed stepCounter never increases from 0

asieradzk commented 2 weeks ago

Hi. Thanks for trying RLMatrix

  1. Yes cuda is used by default if available.

  2. IContinuousEnvironment will not work and has been completely removed. I have not included a replacement yet. It will be something like this: https://github.com/asieradzk/RL_Matrix/blob/master/src/RLMatrix.Common/IEnvironmentAsync.cs I've decided to no longer reset env for users based on step counters, since some use-cases may not require that.

  3. Looks like you might be using an older version, you can clone the one from the repo for best experience. There have been a lot of changes between now and the nuget...

  4. I'll update you when ContinousEnv becomes available (maybe today)!

asieradzk commented 2 weeks ago

I got you homie. Try now. I've added IContinuousEnvironmentAsync you can use it with

LocalContinuousRolloutAgent

Clone the repo, its not on the nuget.

There are 2 caveats: Only works with training: true Only works with 1 environment.

Going to take weekend now but thanks for motivating me to put this in.

Its a bit hard to get right because I insist on using shared parameters for multiple discrete and continuous heads so I've made a mistake somewhere when slicing output tensors.

Example: https://github.com/asieradzk/CartPoleForTesting/blob/master/TrivialContinuousEnvironmentAsync%20.cs image

Kermalis commented 2 weeks ago

Awesome, it looks great. I won't be able to try it out for a while though. I'll close this issue for now