Victorlouisdg commented 4 months ago

Describe the feature you'd like Airo-mono has enabled code sharing for many low and mid-level tasks (e.g. hardware access and basic operations). Should airo-mono also offer standard high-level abstractions for common robot program paradigms:

🧠 SensePlanActController: fits Victor’s use cases very well.
🌳 BehaviorTreeController: maybe relevant for mobile robot
🕹️ AgentController: RL-like controller with env/obs/reward -> action

Design goals 📝

🧩 Reusability and composition: By standardizing interfaces, i.e. how observations are received and actions issued.
📋 Promote clear statement of requirements: Encourage programs to explicitly declare hardware dependencies (sensors, robots) and scenario conditions e.g. a StereoRGBDCamera and an UR5e and a crumpled cloth on the table) This fosters reusability across different setups.
🎨 Maintain flexibility in coding style and tools: Respect individual preferences in application code structure and leave complete freedom to e.g. add a custom sensor or choose visualization tools such as OpenCV windows, rerun logs, open3d, loguru etc.

Possible implementation Abstract base classes that define the methods that their children must implement. For example:

🧠 SensePlanActController:

sense(): collect an observation and save it in self._observation
plan(observation): attempt to find a feasible plan and save in self._plan (uses self._observation if no observation was given)
act(plan)execute a plan
can_act()
loop() start sense()->plan()->act() loop until a feasible plan is found or controller wants to terminate.
visualize_observation()
visualize_plan()
execute(autonomous: bool = False)

This kind of structure maps very naturally to all the controllers I've written for the cloth-competition.

Victorlouisdg commented 4 months ago

Some considerations for each paradigm:

Where are cameras/sensors read? Do we read continuously or only when it is needed?
Where do neural nets go such as keypoint detectors or SAM?
Who manages logging and visualization (drawing, windows, cleanup)?
Where is debug information stored?

m-decoster commented 4 months ago

For our demo, we had a different vision with controllers that are more low level, but I see merit in both of our proposals (mine below).

Main issues with our old architecture

Controllers had their own copies of simulations and planners. There was a distinct lack of code re-use and this was prone to bugs. Planners/plants also had to be remade on the fly in each controller, slowing down the application.
Controllers had certain assumptions about the state of the world and relied on explicit arguments that were not easily stubbed. This made it harder to isolate specific parts of the application flow for testing purposes.
Controllers had a lot of code duplication for manipulations that were on a higher abstraction level than what airo-mono provides, but on a lower abstraction level than a controller itself. For example, servoing at low speeds until we find a change in torque (indicating a collision).

Proposed solutions

Separate state from manipulation code as much as possible.
Likewise, separate perception from manipulation code. This increases controller reusability.
Have a uniform interface with abstract classes (e.g., for arguments: State/Station/WorldModel; for return types: ControllerResult) to increase reusability.

I saw controllers as stateless functions, operating on a State, changing the physical state of the world, and returning a ControllerResult that can impact the application flow (e.g., triggering a perception update). However, this may be too restrictive if we want to do some sensing inside a controller. These functions can be composed quite easily, and higher-level controllers are composed of lower level controllers, typically with a very easy-to-understand code flow (just some function calls). An example is given at the bottom of this comment.

Considerations

Where are cameras/sensors read? I'd say separate from manipulation code if possible. We need to consider use cases.
Where do neural networks go such as keypoint detector or SAM? Do we want to provide a generic abstraction for these too, or do we consider this to be application specific? An airo-vision package could provide, e.g., a segmentation and keypoint detection module, but this would introduce additional code to maintain while offering only a very thin abstraction layer.
Who manages logging and visualization? I think this should go in the controllers themselves, because you may want to log between any two arbitrary operations. There are a couple of operations that we need to do to log to rerun, e.g., transforming point clouds, the BoundingBox3DType... A airo-rerun package might be useful?
Where is debug information stored? I need to think about this some more before I can answer this question.

Conclusion

I think there are several levels of abstraction and we need to decide which we want to address, and how.

Code example

@controller
def grab_capsule(controller_arguments: ControllerArguments) -> ControllerResult:
    """Grab a capsule from the dispenser.
    Args:
        controller_arguments: Controller arguments.

    Returns:
        A ControllerResult."""
    # The dispenser's position is hard coded, so we can use hard coded joint configurations.
    # NOTE: eventually we might want to make values such as this configurable as a data file.

    # Planned to.
    q_pregrasp = np.array([2.39238167, -1.32466565, 0.13710148, 4.24871032, -1.52691394, np.pi])
    # Moved to.
    q_grasp = np.array([2.39213324, -1.0258608, 0.4087413, 3.84247033, -1.54018432, np.pi])

    result = plan_to_joint_configuration(controller_arguments, q_pregrasp)
    if not result.success:
        return result

    result = move_freely_to_joint_configuration(controller_arguments, q_grasp)
    if not result.success:
        return result

    result = move_gripper(controller_arguments, 0.005, Robotiq2F85.ROBOTIQ_2F85_DEFAULT_SPECS.min_speed, Robotiq2F85.ROBOTIQ_2F85_DEFAULT_SPECS.min_force)
    if not result.success:
        return result

    return back_up_from_capsule_dispenser(controller_arguments)

Victorlouisdg commented 4 months ago

I feel like your proposal is close to what I had in mind for an 🕹️ AgentController. If I understand correctly that would look something like this:

station = Station()  # contains the hardware
observer = Observer(station) # also contains Yolo or SAM? 

while not result == done:
  observation = observer.get_observation()  # joint configs, images, point cloud, YOLO detections?
  result = grab_capsule(station.arm, observation)

while not result == done:
  observation = observer.get_observation()
  result = move_to_coffee_maker(station.arm, observation)

...

The Station and Observer together function like an RL Environment, updating the state and providing observations. The grab_capsule function is basically an Agent, except that it is responsible for both decision-making and action execution (as opposed to an RL agent with returns predefined actions).

m-decoster commented 4 months ago

Yes, this is very similar to what I had in mind.

The SensePlanActController could also work very well in the coffee demo, but:

We want to share certain observations between controllers. Pretty much all controllers need to know the position of the coffee maker, for instance.
The internal state should be easy to maintain. For example, to reset a plan or observation should be as simple as self._plan = None. IMO, these should be mutable values with immutable contents (e.g., frozen dataclasses)

Here, I'm copying an issue from the barista repository to have all thoughts in one place:

Problems with current code

Controllers are too complex. They have their own planners and perception modules and they are not reusable.
A controller simply for estimating the position of the coffee maker is overkill and not scalable (do we need to write one for every object type?)
Controllers are too interdependent and cannot be tested easily. The demo always needs to be run from start to finish, which reduces iteration speed.

For example, the LeverOpenerController is a monster of a class that violates, a.o., the single responsibility principle. It maintains its own planner, performs perception and computes bounding boxes, moves based on motion planning, with servoing and also simply with regular MoveJ commands.

Proposed code architecture

I think the following code architecture would be more maintainable, but it could also be too restrictive.

General idea

Controllers should be minimal and reusable when possible. Examples would be a "MoveController" which can move freely or with a planner. There can also be more specialized controllers such as "OpenLeverController" or "PickupMugController". Controllers should only maintain state that is relevant to them (and this is very limited, I can't think of an example at this time). I see them more as functions that manipulate the world based on the world model, rather than stateful objects. It should be possible to test controllers independently from other controllers and (ideally) in any state of the world.
Perception should be separated from controllers. Instead, controllers can query the "Station" about the world. By separating perception from controllers, we can more easily stub perception to do simple tests of controllers or execute certain parts of the code purely in simulation
The station maintains a world model. This world model can be updated based on camera or other sensory inputs. It can be queried to obtain the pose of objects in the world space.
The station also maintains a simulation of the world, allowing for motion planning. It should be kept synchronised with the world model, but we should allow (temporarily) disabling collisions with certain objects

m-decoster commented 2 weeks ago

For the centrifuge demo we used a SensePlanActController (SenseThinkActController) which worked well for us.

In case a controller did not need to sense, think, or act, we simply left the specific method empty.

There were some cases where, due to working against a deadline, we started to interleave sensing, thinking, and acting, but this can easily be avoided by being more strict (and code review).

I enjoyed the separation of sense, think, and act, because especially during development, we could check trajectories and poses in simulation (especially with airo-drake) before running.

However, I'm not sure if we should really supply such interfaces. I think it would be better to document different controller styles somewhere in a "recommended practices" document.

airo-ugent / airo-mono

Robot Program / Controller Abstractions: Should airo-mono offer high-level abstractions for robot programs? #140

Main issues with our old architecture

Proposed solutions

Considerations

Conclusion

Code example

Problems with current code

Proposed code architecture

General idea