Height map around the agent

vivanov879 commented 4 years ago

Hi. This is an amazing poject. I am looking to check out your repo. Do you think it is possible to add a height map around an agent as the original deep mimic paper did? I wonder whether Unity allows to extract the height map somehow. If it does, we can train a model with various obstacles around the agent, and make it run with obstacles around it.

Sohojoe commented 4 years ago

Hi @vivanov879 - I have migrated this project into MarathonEnvs - Note: it does use a heightmap - you can see some of the other environments use randomized terrains.

In my local branch, I have added a few more animations, however, I broke the backflip so I have not pushed that code yet.

I'm working towards a general player controller - similar to this - but it would be good to hear/understand your interest

vivanov879 commented 4 years ago

@Sohojoe Joe, thanks, I installed the repo and ran the environments you implemented. I implemented RL algorithms in mujoco, but the results I got were very unnatural. So I am looking to work it out how to make good looking agents as in deep mimic paper.

I read through your scripts StyleTransfer002Agent.cs and StyleTransfer002Animator.cs. Can you please clarify for me where did you get the specific values for the quaternions and translations in body parts, for example:

        MimicBone("left_shin",        "mixamorig:LeftLeg",    "mixamorig:LeftFoot",   new Vector3(.0f, .02f, .0f),          Quaternion.Euler(0, 0, 180));

As for the general pipeline, my understanding is that we set the rigid bodies positions to match the positions positions of animations corresponding to the current time into the animation by interpolation. In case there is a collision we let the Unity's rigid body collider do the collision, and we do not match the rigid body position according to animation during the collision.

Sohojoe commented 4 years ago

@vivanov879 for the mimicBone: initially I had some code that read the Mojoco script which created the Humanoid. But this was hard to work with, so I exported one the model that the script made and then hand tunned some changes to reduce overlaps. The MimicBone code I tweaked by hand to make my model reference the mocap bones.

re general pipeline - yes, there is a phase parameter in the observation space that goes from 0 to 1 (0=start of animation, 1=end of animation) - this is the same approach as DeepMimic.

The backflip is hard to train so it may be easier to start with the walk animation (you can swap this in the Unity animation controller)

If you are looking for a very smooth simulation, the Ubisoft researcher guy implemented a smooth function which he said mitigates the jitter (I want to implement this at some point). Both DeepMimic and the Ubisoft guy implemented a PD controller which they say improves training time

vivanov879 commented 4 years ago

Thanks for explanation. Can you share the paper for PD controller? I might implement that as part of my online RL course I am taking. I created a ticket on marathon-envs project since I couldn't start training on my ubuntu machine. Will you look that up?

On Apr 30, 2020, at 11:36 PM, Joe Booth notifications@github.com wrote:

@vivanov879 https://github.com/vivanov879 for the mimicBone: initially I had some code that read the Mojoco script which created the Humanoid. But this was hard to work with, so I exported one the model that the script made and then hand tunned some changes to reduce overlaps. The MimicBone code I tweaked by hand to make my model reference the mocap bones.

re general pipeline - yes, there is a phase parameter in the observation space that goes from 0 to 1 (0=start of animation, 1=end of animation) - this is the same approach as DeepMimic.

The backflip is hard to train so it may be easier to start with the walk animation (you can swap this in the Unity animation controller)

If you are looking for a very smooth simulation, the Ubisoft researcher guy implemented a smooth function which he said mitigates the jitter (I want to implement this at some point). Both DeepMimic and the Ubisoft guy implemented a PD controller which they say improves training time

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sohojoe/ActiveRagdollStyleTransfer/issues/14#issuecomment-622096317, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESBIXXMIVOXVOVJ56PHW3RPHONTANCNFSM4MTEGVEQ.

vivanov879 commented 4 years ago

https://github.com/Unity-Technologies/marathon-envs/issues/26

Sohojoe commented 4 years ago

this is the paper that has the smooth controller (they also implemented a PD controller) - these are my notes / what i copied from the paper:

High-frequency oscillation can occur if the policy input varies quickly. High-frequency control changes lead to unnatural character motion even when they improve the overall policy performance. In order to remedy this we use two strategies.
- First, we lower the policy evaluation rate, limiting the policy to take one action at every k simulation steps, and consider the action to remain fixed across those k steps
- Second, we apply a filtering strategy to smooth the actions before they are applied to the character. A recursive exponentially weighted moving average filter is used [Ostertagova and Ostertag 2012],
- yt =β at +(1−β)yt−1,
- where yt and at are the output and input of the filter at time t respectively. Here, the action stiffness β is a hyperparameter controlling the filter strength. Lowering policy evaluation rate allows for lowered runtime costs and faster training, but also gives individual actions more weight since they are applied over multiple simulation steps. In our case we found setting k = 2 and β = 0.2 gave the best visual results in our experiments. This reduces behaviors that make the character look twitchy, although it can have an impact on maximum performance (see Fig 7). To ensure the filtering is visible to the policy we also provide yt −1 in the state st . The quantitative effect of action stiffness on rewards can be visualized in Fig 6. An ablation study considering both the state and action representation can be found in Section 7.3.

this is the paper about the PD controller

Zju-George commented 4 years ago

@Sohojoe Have you implemented the PD controller in any of your repos?

Sohojoe commented 4 years ago

@Zju-George - sorry for the slow response, no I've not implemented a PD Controller - I've wanted to try but never gotten around to it

Sohojoe / ActiveRagdollStyleTransfer

Height map around the agent #14