allenai / ai2thor

An open-source platform for Visual AI.
http://ai2thor.allenai.org
Apache License 2.0
1.16k stars 217 forks source link

Held objects causing physics to misbehave? #449

Open guydav opened 4 years ago

guydav commented 4 years ago

Hello,

I've been running into some issues where the agent holding an object causes the physics system to misbehave. I can try a capture a video, but hopefully the textual description will suffice. When the agent has an object in hand, it appears to have effectively infinite strength. For example, when holding even the lightest item, the arm seems to have enough strength to flip over a bed, or a desk. One room that this should be easy to demonstrate in is FloorPlan326_physics.

Does this make sense? Should it be happening? Is there any reasonable workaround?

Thank you!

roozbehm commented 4 years ago

This is a bug I think. It would be great if you could send us the sequence of actions that led to this behavior so we can reproduce the issue.

guydav commented 4 years ago

I’m doing this in interactive mode in the browser. I’ll try to grab a recording tomorrow, hope that might prove helpful? I can also try to log all of the actions I take.

On Aug 17, 2020, at 20:07, Roozbeh Mottaghi notifications@github.com wrote:

 This is a bug I think. It would be great if you could send us the sequence of actions that led to this behavior so we can reproduce the issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

winthos commented 4 years ago

Hello!

While the browser uses the same objects and environment as the actual framework, interactive demo uses a different control scheme and set of behaviors compared to the actual framework. What you are observing is due to how the browser version of the agent can be controlled with mouse and keyboard. Knocking around objects and clipping held objects through things in the way you have probably been seeing, with “infinite strength” does not happen in the actual framework.

guydav commented 4 years ago

Hi @winthos ! Thank you for the clarification. However, I'm actually hoping to use the browser version right now, to run an experiment on MTurk and collect some data from people. Could you describe a little more the difference between how it works in browser version and the interface agents have to interact with the system? Or what would I have to do to get more 'reasonable' behavior from the browser version?

Thank you!

guydav commented 4 years ago

Hi @winthos! Following up, could you provide any additional information on how to get more reasonable behavior in the browser version?

Thank you!

winthos commented 4 years ago

Hello! Sorry for the delay.

Because the behavior in the web demo is so fundamentally different from the actual framework, it is not really feasible to get behavior in the web demo that is closer to the actual framework. Some behaviors you can cause the agent to do in the web demo do not have a one to one parity with the actual framework, and vice versa. It's better to think of them as two entirely separate entities that happen to share the same set of scenes and objects.

When an agent moves or rotates in the actual framework, there are checks that prevent the action if an object the agent is holding would do something like clip into the environment. This is done by casting a projection of the agent + held object's collision along the path of movement, be it a rotate, move, or look action. If something is hit by this projection, it will cause the action to fail to prevent the agent from moving into a state that would cause it to get stuck.

The web demo does not use this logic, as it uses controller that is closer to an FPS controller you might find in a first person game. Because the movement and rotation is controlled via mouse delta and keyboard inputs, there is no pre-casting to check for potential collisions prior to the movement. This is also why the vast majority of the actions available to agents in the actual framework are not usable in the web demo. The web demo was made purely as an interactive way to explore some of our scenes without having to set up the full environment itself.

If you need to MTurk some data, It would be best to build an interface that allows turkers to use the actual framework.

guydav commented 4 years ago

Hey @winthos! No worries, it's far from urgent.

Trying to slightly further my understanding here -- are the differences in behavior coming from the Python layer, or different code being used to handle it on the Unity side? Is it a question of which AgentController handles the commands (coming either from the Python side, or from the browser), or are these checks implemented in the Python API?

Perhaps a different way to ask the same question: when you say 'the actual framework', what specifically do you mean?

I'm imagining two cases that might exist: in one, the entirety of the behavior I need is somewhere in the Unity C# code, and what I need to do is make sure the correct controller is being used by the web interface, and perhaps that all of the actions are exposed to the user. This should, hopefully, mean I need to do fairly little additional work to get from the prototype WebGL based experiment I implemented to something that behaves as I'd hope it to behave.

Alternatively, it could be that much of the logic is in the Python layer, or implemented in Unity in a way that it would be prohibitively difficult to expose to the web, such that the WebGL based setup is no longer a reasonable way to go about it.

If you have any additional guidance for how to figure this out, or where I should go looking in the code for the additional checks and better logic, I would really appreciate them. I'm also happy to share the prototype I have now if it helps. Just to be clear, I'm not asking for anyone to do any of the work for me -- but some guidance in the right direction would be highly appreciated.

winthos commented 4 years ago

Hello!

The differences between the core framework and the web demo are due to different code on the Unity side. The base framework uses various AgentControllers initialized and controlled via python to control different agents within the framework. The webgl demo does not use this controller logic at all, and instead uses a unique controller to the web demo version that does not require a python interface, and also allows movement via the WASD keys and mouse input to change the agent's look orientation.

I'm a very visual person so this may help, this is a quick and dirty representation of the way that the framework differs when using the python controller and the webgl demo.

frameworkcomparison

The "actual framework" is represented on the left, where the python controller sends "actions" to the Unity simulation, and in return the unity simulation will return metadata events once the actions are completed. Many of the checks for physics that you are looking for are handled by Unity after receiving an instruction from python. This can be thought of as a sort of stepwise way to interact with the environment, where the agent gets sent a command, something will happen, and a result will be sent back. There are many aspects of this side of the framework that are exclusive to this python controller paradigm.

On the right side, you have the webgl demo. This does not use a python controller at all, but rather a unique "webgl" build of the framework that allows control of the agent via mouse and keyboard. There are a few things that are shared between the webgl and python sides, such as a some of the pickup/open actions as well as the same environment, but the rest of the logic regarding metadata, movement, manipulation, etc are exclusive to the python controller.

So ultimately, it is a nontrivial process to try and "expose" the webgl behavior to the python controller because they are separate entities entirely. Additionally, trying to enforce the functionality of the python actions, especially the checks like collision when holding an object, in the webgl demo would also be just as difficult because the web demo does not use this stepwise action->response scheme like the python controller.

I think to start, a good idea would be to check out how actions passed via python actually execute on the unity side. A good example is a basic navigation action, MoveAhead (https://ai2thor.allenai.org/ithor/documentation/actions/agent-navigation/#moveahead)

This action corresponds to a function that takes a ServerAction class as a parameter. This can be found on the PhysicsRemoteFPsAgentController.cs script/unity component here: https://github.com/allenai/ai2thor/blob/master/unity/Assets/Scripts/PhysicsRemoteFPSAgentController.cs#L1625

Note that the MoveAhead() action takes the same parameters defined in the python command, moveMagnitude is set by action.moveMagnitude. All actions the agent can take are defined in this way, and the ServerAction class itself is defined here: https://github.com/allenai/ai2thor/blob/master/unity/Assets/Scripts/AgentManager.cs#L1220

Moving on a bit to some of the physics checks used to prevent the agent from clipping, you will see that the PhysicsRemoteFPSAgentController inherits from a base class. In this base class, some of the logic for the physics checks are defined here: https://github.com/allenai/ai2thor/blob/master/unity/Assets/Scripts/BaseFPSAgentController.cs#L674

This is just an example of a translational movement the agent can perform. There are additional checks if the agent were to change look orientations or rotate. See here: https://github.com/allenai/ai2thor/blob/master/unity/Assets/Scripts/PhysicsRemoteFPSAgentController.cs#L437

That function calls a check for if the agent can rotate, defined here: https://github.com/allenai/ai2thor/blob/master/unity/Assets/Scripts/PhysicsRemoteFPSAgentController.cs#L626

This might be what you could look at to see how the agent checks for collision when holding an object. The flag that determines if a rotation is successful or not is set within that check.

guydav commented 4 years ago

Thank you very much, @winthos ! The visual is super helpful, as are the pointers. I'll start digging into this over the next couple of days, and I'll follow up if and when I have any additional questions.

If and when I get it working, I'll be happy to send a PR to make this available for other users, too.

guydav commented 4 years ago

Hi @winthos! As I expected, your comments were super helpful. I've been doing some work on trying to get this to behave properly, and I'm still running into a couple of issues. I don't fully understand the internal machinations of some Unity components, so I figured I'll ask here while I continue debugging, to see if you (or anyone else reading) have any ideas:

First issue: adding the checks on rotation causes it to get completely stuck

Process: I changed DebugFPSAgentController:MouseRotateView() to the following, taking the parsing code from MouseLook.cs (though I really don't understand why it maps "Mouse X" to yRot and "Mouse Y" to xRot, although that's a whole other issue). If I understand what I'm doing, I apply either or both directional checks, and rotate if all checks come out positive.

private void MouseRotateView()
{
    // ??? why are these backward? at least they were in the code in MouseLook.cs
    float xRot = CrossPlatformInputManager.GetAxis("Mouse X") * m_MouseLook.XSensitivity;
    float yRot = CrossPlatformInputManager.GetAxis("Mouse Y") * m_MouseLook.YSensitivity;

    if (xRot != 0 || yRot != 0)
    {
        bool canRotateX = true;
        bool canRotateY = true;
        string directionX = "";
        string directionY = "";
        if (xRot != 0)
        {
            directionX = xRot > 0 ? "right" : "left";
            canRotateX = PhysicsController.CheckIfAgentCanRotate(directionX, xRot);
        }
        if (yRot != 0)
        {
            directionY = yRot > 0 ? "up" : "down";
            canRotateY = canRotateY && PhysicsController.CheckIfAgentCanRotate(directionY, yRot);
        }

        Console.WriteLine(String.Format("Mouse rotation | x: {0} | directionX: {1} | canRotateX: {2} | y: {3} | directionY: {4} | canRotateY: {5}",
            xRot, directionX, canRotateX, yRot, directionY, canRotateY));
        if (canRotateX && canRotateY)
        {
            m_MouseLook.LookRotation(transform, m_Camera.transform);
        }

    }
}

Problem: if I get a held object next to a piece of furniture in the room, it gets 'stuck', in a way that another rotation cannot undo the sticking, and the agent has to move in order to enable rotation again. It's hard to get a sense of this problem without the video below, see a capture of my interface, with the console printouts that hopefully demonstrate the issue: https://drive.google.com/file/d/1gtrO_a_IbpGNFl5Mg4rA12qUz82XsUGT/view?usp=sharing

Second issue: adding the movement check in FPS input still doesn't prevent odd collision behavior

Process: I added to DebugFPSAgentController:FPSInput() the same sort of checks we see in the code you referenced above. Specifically, it’s the check on PhysicsController.checkIfSceneBoundsContainTargetPosition(targetPosition) && PhysicsController.CheckIfItemBlocksAgentMovement(moveDelta.magnitude, angleInt, false) && PhysicsController.CheckIfAgentCanMove(moveDelta.magnitude, angleInt) before enabling movement, doing my best to compute the target position. As an aside, is there a reason that the DebugFPSAgentController doesn’t inherit from the BaseFPSAgentController? It would make some of this cleaner, but having access to the PhysicsController makes this code reasonable.

private void FPSInput()
{
    // This is the actual movement function
    //Console.WriteLine("DebugFPSAgentController.FPSInput reached");
    //take WASD input and do magic, turning it into movement!
    float speed;
    GetInput(out speed);
    // always move along the camera forward as it is the direction that it being aimed at
    Vector3 desiredMove = transform.forward * m_Input.y + transform.right * m_Input.x;
    // get a normal for the surface that is being touched to move along it
    RaycastHit hitInfo;

    Physics.SphereCast(transform.position, m_CharacterController.radius, Vector3.down, out hitInfo,
        m_CharacterController.height / 2f, Physics.AllLayers, QueryTriggerInteraction.Ignore);
    desiredMove = Vector3.ProjectOnPlane(desiredMove, hitInfo.normal).normalized;

    //m_MoveDir.x = desiredMove.x * speed;
    //m_MoveDir.z = desiredMove.z * speed;
    // before it wasn't writing y, so it was adding the effect of gravity (see below), which would continue accumulating in the negative y direction
    m_MoveDir = desiredMove * speed;  

    // if(!FlightMode)
    m_MoveDir += Physics.gravity * m_GravityMultiplier * Time.fixedDeltaTime;

    // Copied over from BaseFPSAgenController.moveInDirection for controller movement purposes
    float angle = Vector3.Angle(transform.forward, Vector3.Normalize(m_MoveDir));
    float right = Vector3.Dot(transform.right, m_MoveDir);
    if (right < 0)
    {
        angle = 360f - angle;
    }
    int angleInt = Mathf.RoundToInt(angle) % 360;

    //added this check so that move is not called if/when the Character Controller's capsule is disabled. Right now the capsule is being disabled when open/close animations are in progress so yeah there's that
    if(m_CharacterController.enabled == true) {
        // Trying to mimic some of the move validation logic
        Vector3 moveDelta = m_MoveDir * Time.fixedDeltaTime;
        Vector3 targetPosition = transform.position + moveDelta;

        if (PhysicsController.checkIfSceneBoundsContainTargetPosition(targetPosition) &&
            PhysicsController.CheckIfItemBlocksAgentMovement(moveDelta.magnitude, angleInt, false) &&
            PhysicsController.CheckIfAgentCanMove(moveDelta.magnitude, angleInt))
        {
            CollisionFlags movementResult = m_CharacterController.Move(moveDelta);

            #if UNITY_WEBGL
            JavaScriptInterface jsInterface = this.GetComponent<JavaScriptInterface>();
            if ((jsInterface != null) && (math.any(m_Input)))
            {
                MovementWrapper movement = new MovementWrapper
                {
                    position = transform.position,
                    rotationEulerAngles = transform.rotation.eulerAngles,
                    direction = m_MoveDir,
                    targetPosition = desiredMove,
                    input = m_Input,
                    //movement.succeeded = movementResult == CollisionFlags.None;
                    touchingSide = (movementResult & CollisionFlags.Sides) != 0,
                    touchingCeiling = (movementResult & CollisionFlags.Above) != 0,
                    touchingFloor = (movementResult & CollisionFlags.Below) != 0
                };

                jsInterface.SendMovementData(movement);
            }
            #endif
        }
        else if (math.any(m_Input))
        {
            Console.WriteLine(String.Format("Move failed: {0} && {1} && {2}",
                PhysicsController.checkIfSceneBoundsContainTargetPosition(targetPosition),
                PhysicsController.CheckIfItemBlocksAgentMovement(moveDelta.magnitude, angleInt, false),
                PhysicsController.CheckIfAgentCanMove(moveDelta.magnitude, angleInt)));
            Console.WriteLine(String.Format("Current position: {0} | desired move: {1} | move direction: {2} | move delta: {3} Target position: {4} | move magnitude: {5} | angleInt: {6}",
                transform.position, desiredMove, m_MoveDir, moveDelta, targetPosition, moveDelta.magnitude, angleInt));
            if (!PhysicsController.checkIfSceneBoundsContainTargetPosition(targetPosition))
            {
                Console.WriteLine(String.Format("Scene bounds: {0}", PhysicsController.sceneBounds));
            }
        }
    }
}

Problem: It feels a little like the Archimedes quote, about being able to move the entire world with a sufficiently long lever. See the video capture below, but it seems the agent is almost infinitely forceful, being able to pick up any object and move far heavier ones. I would expect a collision with a held object to either (a) move the held object back, or (b) fail (which is what I assumed CheckIfItemBlocksAgentMovement does), but neither really seems to happen. Do you have any idea what I can do to get more reasonable behavior out of this? A video demonstrating this issue is here: https://drive.google.com/file/d/1PqPj-GfajJdjROLAWN9FsZ1mXEazldtD/view?usp=sharing

winthos commented 4 years ago

Hello!

Issue One:

I think a few things are happening here, at least at a first glance. First, I also don't fully understand why the default MouseRotateView() maps the Mouse X delta to yRotation, and the Mouse Y to xRotation, as that is a default unity component included with the standard assets package the base of the FPS controller was built from.

I did check into it a little bit, and it seems like the reason the mapping seems weird is because the change in the mouse's x seems to be changing the rotation of the entire controller's body, which does make sense. Moving the mouse left should rotate the whole body of the controller, hence why it is modifying the y axis rotation of the m_CharacterTargetRot. The change in Mouse Y, or moving the mouse up and down, seems to be mapping to the rotation about the x axis of the camera, which is why the MouseY is mapped to xRot. The camera's local directional axes make it so the camera's x axis is always in the same direction relative to it's parent character controller, so basically the entire LookRotation is updated by:

Change in X of the Mouse rotates the whole character, which also rotates the camera Change in Y of the mouse changes how much around the camera transform's local x Axis to rotate.

Because of the way they are applying the rotation, specifically modifying and clamping the local rotation of both the character and camera, I believe this is why the mappings are the way they are.

Moving on, I think the behavior you are seeing with the object getting stuck is due to the nature of how the MouseLook and changing camera and agent rotations via mouse deltas is not accounted for in the CheckIfAgentCanRotate check you are borrowing from the PhysicsController. The physics controller only allows a rotation either left/right or up/down exclusively. The mouselook, intrinsically, can map deltas to both axes simultaneously, allowing for things like a diagonal shift in the camera. This diagonal shift is a combination of the character rotating and the camera rotating, which is currently not handled at all by the CheckIfAgentCanRotate call. I don't know exactly why the check is currently behaving the way it is, but I would recommend not using it at all, and instead coming up with a new check entirely that can account for this simultaneous character and camera rotation.

Issue Two:

So the Debug controller doesn't inherit from the base for similar reasons I outlined above in with the big Venn diagram showing the differences between the WebGL and Python modes. Things like the mouse and keyboard control being able to move entirely differently from the python controller make it so it doesn't make sense for it to inherit, as all functions in the Base controller are meant to only be used with other controllers that are interacted with via python commands.

I'm a little unclear on what sort of expected behavior you are wanting from the checks you added in FPS Input. But it seems like the purpose is to prevent the agent from moving through objects? I think the actual issue is something else entirely. Based on your video, the behavior you are seeing is because of the way the agent picks up objects. When an object is picked up, it becomes parented to a "hand" transform that is a child of the agent. Because our agent is a Kinematic controller, the rigidbody behavior of any held objects with colliders is inherited. It seems like objects picked up by the agent have infinite force because they in fact do in the eyes of the physics engine. There are many ways to pick up objects and attach them to a character, and given the way we are using a kinematic controller instead of a rigidbody controller is the reason we chose to do it this way.

The issue with this method is, like you are observing, objects picked up are treated in a way that allows them to knock things over and push them as if they have infinite mass. This is because the way a kinematic controller is moved is via interpolating between different rotations or positions, basically as if it is controlled via an animation. To account for this, we use checks like the CheckIfAgentCanRotate function that basically casts the projected path of rotation of a held object, and will stop the rotation if anything were to be hit. A good way to think of this is every movement or rotation is preempted by a check to see if it is possible. When using the mouse delta logic to move the camera and agent, it would be harder to do this as you don't know where the final position or rotation of the agent will be until the mouse actually moves, as it updates as the delta does instead of being presented a potential movement, checking it, and then setting the transforms to those coordinates.

Next Steps:

So overall, I have some ideas of what you are trying to accomplish. It seems like you are wanting to add checks to the mouselook based debug controller in order for them to have more parity with the behavior of the python controller. I fear this may be an exercise in futility, as the logic used in one cannot be fully supported in the other, at least without major overhauls to consider things like diagonal camera rotation.

So, if you want to make a turkable interface that is consistent and has direct parity with the python controlled version of THOR, I would recommend not using the debug controller or mouse look related control scheme at all, and instead build some sort of point and click interface. I can imagine buttons on the screen that map to actions like Move, LookUp, LookDown, PickUp, etc that would allow turkers to easily interact with the environment and also utilize all the built in checks that already exist without needing to modify those.

On the other hand, if you want to make more reasonable behaving mouse and keyboard based controller, then I would try something like changing the entire logic of how objects are picked up. Many games have some behavior like when an object is picked up, it is sort of attached to a "spring" which represents where an invisible hand of the character is. If that spring were to move through another object or a wall, instead of the held object moving through things as they currently do in all the behavior you are seeing, the held object would sort of hit the obstructing object and try to pull itself toward the invisible hand, but if the "resistance" from the blocking is too great the object is just dropped.

Here is an old video I have referenced before that demonstrates this behavior.

https://youtu.be/Xv-c3-IOnM0?t=474

Basically, the rock he is holding would move through that wall, but because of the way his pickup/grab/throw logic is implemented it instead falls to the ground instead of clipping through.

The downside of going down this path is, again, the further away you will get from the behavior of the python controller.

guydav commented 4 years ago

Hi @winthos !

Thank you, as usual, for your very thorough review. I think your technical pointers will be very useful for fixing the misbehavior I encountered, but your high-level point is even more useful.

After thinking about it for a while, I think that I actually don't necessarily care right now if the interaction model is identical between agents using the Python framework and humans interacting with my online interface. While I will collect the trajectories my human subjects go through, I'm a little bit more interested (at least at first) with the rest of the data I'll be collecting from them. Accordingly, I'll look into changing the object handling logic, as you're suggesting there, and see where that leads me.

I'll follow up here with further thoughts or comments as I go along, and as before, if we deem some parts of what I end up with as useful for other people, I'll be happy to work to merge them in.