Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.89k stars 4.11k forks source link

"No episode was completed since last summary." but Done() is surely called #2109

Closed trinhthanhtrung closed 4 years ago

trinhthanhtrung commented 5 years ago

Hi all, I'm new here. I'm currently having a problem. My model I designed need to call Done() and reset the environment every AgentAction(). My code for AgentAction() could be simple as this

    public override void AgentAction(float[] vectorAction, string textAction)
    {
        this.transform.position = new Vector3(vectorAction[0], 0.5f, vectorAction[1]);
        AddReward(Random.Range(-1f, 1f));
        Done();
    }

But ml-agents doesn't seem to accept this. When I start training, the Done() is surely called and AgentReset() is executed properly. But in console it only shows the Mean Reward for the first time, then it is

Step: 1000. Mean Reward: 0.723. Std of Reward: 0.000. Training.
Step: 2000. No episode was completed since last summary. Training.
Step: 3000. No episode was completed since last summary. Training.
...

This can be fixed if Done() is not called every AgentAction(), for example if I call it after 1000 times AgentAction() is called, or if I add some condition like this

        if (this.transform.position.x < 0)
            Done();

I wonder what might cause the problem and how to fix it, apart from calling it every 1000 AgentAction() like what I am doing right now.

Note: I have read the discussion in https://github.com/Unity-Technologies/ml-agents/issues/988 and https://github.com/Unity-Technologies/ml-agents/issues/1169 but that doesn't solve my problem.

ScriptBono commented 5 years ago

I would try this, which solved a lot of issues I had so far:

if( !IsDone() ){
        this.transform.position = new Vector3(vectorAction[0], 0.5f, vectorAction[1]);
        AddReward(Random.Range(-1f, 1f));
        Done();
}

My assumption is that the system takes a while to register that an agent is done and this could lead to some problems when you call Done() again. ( just an idea)

trinhthanhtrung commented 5 years ago

I would try this, which solved a lot of issues I had so far:

if( !IsDone() ){
        this.transform.position = new Vector3(vectorAction[0], 0.5f, vectorAction[1]);
        AddReward(Random.Range(-1f, 1f));
        Done();
}

My assumption is that the system takes a while to register that an agent is done and this could lead to some problems when you call Done() again. ( just an idea)

Thanks for the suggestion, but that doesn't solve the problem. I have tried to replace the code in AgentAction() with yours but the result is still the same :(

harperj commented 5 years ago

Hi @trinhthanhtrung -- I believe this is because of how the trainer collects experiences into the training buffer. It adds the experience only if the previous experience wasn't "done" so that we split up trajectories. However, this means we currently require at least one step to occur before the "done". I was able to reproduce by adding a simple flag to track every other step:

        if (_justCalledDone)
        {
            _justCalledDone = false;
        }
        else
        {
            AddReward(Random.Range(-1f, 1f));
            Done();
            _justCalledDone = true;
        }

With this flag, the step output will be what you expect. Can you explain your use-case for calling Done every step? It should be possible to change how we handle this, but I haven't heard this request before.

trinhthanhtrung commented 5 years ago

Thanks @harperj for your replies. I have tries the code and I can confirm this works.

My particular use case is when the agent decision requires some planning, for example analyse the environment before taking any actions. That's why I have set up the agent to send various observations every AgentAction() to the network to come up with the best "plan". It doesn't sound very "reinforcement learning-y", but that's the first step. The RL part will be utilised later.

devynbennet commented 5 years ago

Hello, I am having the the same issue with "No episodes was completed". I've gone back and looked at #988 and #1169 as well. I'm using On Demand decision making, and my basic loop goes like this: In the AgentReset() method, I call the RequestDecision() method. It then makes the observations and performs the agents action, where I then reward it accordingly, and set it to Done(). I have the agent set to reset on done. I've tried changing the max step values for both the academy and the agent. As far as I can tell, I'm still somehow not ending the episode, but everything suggests it should be ending properly. Unfortunately @harperj suggestion of using a _justCalledDone flag didn't work, but I also don't think that solution makes much sense for an on-demand agent. What can I do to try and properly end the episode?

Here is my agent:

public override void AgentReset()
    {
        //Reset target position
        Vector3 newPos;
        do
        {
            newPos = Random.insideUnitSphere * maxDistance;

        } while (Vector3.Distance(transform.position, newPos) < minDistance);

        targetTransform.position = newPos;
        firingVelocity = Random.Range(minFiringVelocity, maxFiringVelocity);

        //Request agent decision
        RequestDecision();
    }

    public override void CollectObservations()
    {
        AddVectorObs(transform.position - targetTransform.position);
        AddVectorObs(firingVelocity);
    }

    public override void AgentAction(float[] vectorAction, string textAction)
    {
        //Only care about x and y rotation
        float x = vectorAction[0] * 360;
        float y = vectorAction[1] * 360;

        transform.rotation = Quaternion.Euler(x, y, 0);

        //Fire the projectile out, and when it hits something, it will reward the agent
        FireProjectile();
    }

    public void FireProjectile()
    {
        TrainingProjectile proj = Instantiate(projectile, firePoint.position, firePoint.rotation);
        proj.myRigidBody.velocity = firePoint.forward * firingVelocity;
        proj.trainedAgent = this;
        proj.targetTransform = targetTransform;
        proj.rewardValue = hitReward;
        proj.failueValue = missValue;
        proj.maxDistance = maxProjectileDistance;
    }

And here is the projectile, which rewards the agent when hitting a target, and sets it to done:

//check if projectile is out of range
    protected override void FixedUpdate()
    {
        base.FixedUpdate();

        //if out of range, the agent is punished
        if (Vector3.Distance(trainedAgent.transform.position, transform.position) > maxDistance)
        {
            //Add the punishment and end the episode
            trainedAgent.AddReward(failueValue);
            trainedAgent.Done();

            //Destroys this gameobject
            explode(Quaternion.LookRotation(myRigidBody.velocity), myRigidBody.position);
        }

    }

    //when the projectile hits something
    protected override void OnHit(Collider collider)
    {
        //if the hit was valid, reward the agent and end episode
        if (collider.transform.root.Equals(targetTransform.root))
        {
            Debug.Log("Hit the target");
            trainedAgent.AddReward(rewardValue);
            trainedAgent.Done();
        }

        //else, punish and end
        else
        {
            trainedAgent.AddReward(failueValue);
            trainedAgent.Done();
        }
    }
gzrjzcx commented 5 years ago

It adds the experience only if the previous experience wasn't "done" so that we split up trajectories. However, this means we currently require at least one step to occur before the "done".

Hi @harperj , what is the specific meaning of this explanation? It means at least one CollectObs -> Python API make decision -> AgentAction take decision process finished, then it can be viewed as done and the episode end, so that the experience can be learned?

My problem is that with your _justCalledDone trick, I can start the training process, but after some steps(e.g. 90000 steps) then the model shows that 'no episode was completed...'. Is it means that from that time there is no decision chosen to culminate reward (e.g. keeping an unexpected action to avoid penalty) and reset the _justCalledDone flag to true? Therefore, the whole training process enters a block of 'no episode was completed' ?

But also one problem is that if it cannot choose an action to finish the episode and add reward. Then why it can show no episode was completed? The contradiction is if this action is not chosen, this episode will not be end. Because only once the action is chosen, then it is possible to call the Done() to end the episode.

harperj commented 5 years ago

Responding to a couple of additional questions on this issue:

@devynbennet I think that if you can't reproduce the fix with this change you're likely experiencing a different issue. Unfortunately we don't have resources to help debug custom environments in general. As high level advice, what I can recommend is to call Done() from the AgentAction method since this will make sure the done flag is set at the correct time. If you believe this issue is a bug with ML-Agents, please try to reproduce with an example environment (perhaps with a minimal patch) and create another issue.

@gzrjzcx the issue is that at this time we need to have two steps -- pairs of (observation, action, reward) -- at the trainer side before we consider this an episode and add it to the training buffer. In general this is not a problem because having trajectories of only a single step is not the usual / expected use case of the types of algorithms we're using. That said, it should be possible for us to change it in the future.

YL618 commented 5 years ago

@harperj I‘m having the same problem and it starts "No episode was completed since last summary." After 10000 steps; Actually I just change a small part of the code for the rollerAgent in the tutorial guide on this page https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Create-New.md#final-editor-setup , when I ran the tutorial script there's no problem, but when I'm running the script after some modification it just doesnt work... And problem still occurs after I tried the flag method mentioned above...

My script for the agent (It's actually on a cube object but I didnt change script's name)

[public class RollerAgent : Agent { // Start is called before the first frame update //Rigidbody rBody; public GameObject Obstacle; public GameObject Plane; public LayerMask unwalkableMask; bool TocallDone=true; //transform.rotation.x=target.rotation.x; //transform.rotation.x = target.rotation.y; //transform.rotation.x=target.rotation.z; //transform.localEulerAngles = new Vector3(target.rotation.x, target.rotation.y, target.rotation.z); void Start() { //rBody = GetComponent(); Obstacle = GameObject.Find("Obstacle"); } public Transform Target; public override void AgentReset() { if (this.transform.position.y < 0) { // If the Agent fell, zero its momentum //this.rBody.angularVelocity = Vector3.zero; //this.rBody.velocity = Vector3.zero; this.transform.position = new Vector3(0, 0.05f, 0); transform.localEulerAngles = new Vector3(0,0,0); }

    // Move the target to a new spot
    Target.position = new Vector3(Random.value * 8 - 4,0.05f,Random.value * 8 - 4);
    //Move the obstacle to a new spot
    Obstacle.transform.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);
}

public override void CollectObservations()
{
    // Target and Agent Obstacle positions
    AddVectorObs(Target.position);

    AddVectorObs(this.transform.position);
    AddVectorObs(this.transform.localScale);

    AddVectorObs(Obstacle.transform.position);
    AddVectorObs(Obstacle.transform.localScale);
    // Agent velocity
    //AddVectorObs(rBody.velocity.x);
    //AddVectorObs(rBody.velocity.z);
}

public float speed = 2;
public override void AgentAction(float[] vectorAction, string textAction)
{
    // Actions, size = 2
    Vector3 controlSignal = Vector3.zero;

    controlSignal.x = vectorAction[0];
    controlSignal.z = vectorAction[1];

    Vector3 targetPos = Vector3.zero;
    targetPos.x = transform.position.x + controlSignal.x;
    targetPos.z = transform.position.z + controlSignal.z;
    //rBody.AddForce(controlSignal * speed);
    transform.position = Vector3.MoveTowards(transform.position, targetPos, speed * Time.deltaTime);
    //HalfExtend
    Vector3 halfExtend;
    halfExtend.x = transform.localScale.x / 2;
    halfExtend.y = transform.localScale.y / 2;
    halfExtend.z = transform.localScale.z / 2;

    // Rewards
    float distanceToTarget = Vector3.Distance(this.transform.position,Target.position);

    // Reached target
    if (distanceToTarget ==0)
    {
        if (TocallDone)
        {
            TocallDone = false;
        }
        else
        {
            SetReward(1.0f);
            Done();
            TocallDone = true;
        }  
    }

    //Hit the obstacle
    if(Physics.CheckBox(transform.position, halfExtend,transform.rotation, unwalkableMask))
    {
        if(TocallDone)
        {
            TocallDone = false;
        }
        else
        {
            //SetReward(-1.0f);
            Done();
            TocallDone = true;
        }        
    }

    // Fell off platform
    if (this.transform.position.y < 0)
    {
        if (TocallDone)
        {
            TocallDone = false;
        }
        else
        {
            //SetReward(-1.0f);
            Done();
            TocallDone = true;
        }
    }

    if (this.transform.position.x >(Plane.transform.lossyScale.x*5+(this.transform.lossyScale.x/2))|| this.transform.position.x <- (Plane.transform.lossyScale.x * 5 + (this.transform.lossyScale.x / 2)))
    {
        if (TocallDone)
        {
            TocallDone = false;
        }
        else
        {
            //SetReward(-1.0f);
            Done();
            TocallDone = true;
        }
    }

    if (this.transform.position.z > (Plane.transform.lossyScale.z*5 + (this.transform.lossyScale.z / 2))|| this.transform.position.z <- (Plane.transform.lossyScale.z * 5 + (this.transform.lossyScale.z / 2)))
    {
        if (TocallDone)
        {
            TocallDone = false;
        }
        else
        {
            //SetReward(-1.0f);
            Done();
            TocallDone = true;
        }
    }
}

}](url)

YL618 commented 5 years ago

@harperj I‘m having the same problem and it starts "No episode was completed since last summary." After 10000 steps; Actually I just change a small part of the code for the rollerAgent in the tutorial guide on this page https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Create-New.md#final-editor-setup , when I ran the tutorial script there's no problem, but when I'm running the script after some modification it just doesnt work... And problem still occurs after I tried the flag method mentioned above... <

I solve this by setting the maximum step finally...

EnthusedDragon commented 5 years ago

I am also experiencing the same problem. Even while only calling Done() in the AgentAction.

The Resest is clearly triggered, but it keeps on logging No episode was completed since last summary

brifl commented 5 years ago

I am experiencing this as well. It happens more frequently when my agent hits its maximum reward (I call Done() when this happens) and the "Std of reward" is 0.00. I would like to understand how it determines this, so it doesn't interfere with my training. When it is hitting max reward, I increase difficulty, so I want it to continue learning.

chriselion commented 4 years ago

Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue the discussion.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.