File format for saving the data

pFernbach commented 5 years ago

I open this discussion to check the needs of everyone working with the dataset of motion. We are working on a new framework to produce a new version of the dataset and thus we can still change the information that we are going to save and their format.

I believe that the size (disk usage) of the dataset is not really an issue and that it's better to have something more convenient to use than something small. Correct me if you think I am wrong on this point.

Below are my suggestions on what we should save and their format. Feel free to correct me or add anything :

Contact sequence :

A sequence of contact phases, each defining the placement of the active contacts.

The contact sequence will be stored with a ContactSequence object defined by the library Multicontact-API in the version 1.0.0 that we are going to release.

This object can be serialized either to xml or binary.

Centroidal trajectory :

For each contact phase: the "state trajectory" (c,dc,L), the "control trajectory" (ddc,dL) and the duration of the phase.

Will also be stored in the contactSequence object. In the current version this trajectories are stored with a discretized set of points. We are working on the possibility to store a continuous representation, but it won't be ready for the V1.0.

Whole body motion :

q(t)

Basic file format to store this information is a text file where each line is [t q(t)]. Serialization of a numpy matrice is also an option, I need to compare the size of both to see if it's really interesting.

Other easily accessible data :

Our framework compute other data that you may want to store, for example :

dq(t), ddq(t), contact forces, ZMP, effectors trajectories.

wxmerkt commented 5 years ago

Basic file format to store this information is a text file where each line is [t q(t)].

In MEMMO, there is Deliverable 8.2 which defines the format. In particular large the numpy binary format was recommended for data. In general I do not have a problem with large datasets - but large text files. In Python for data analysis it takes a very, very long time to load large csv files while loading binary data files is practically instant.

teguhSL commented 5 years ago

According to the meeting in Paris, the following data is also required:

1. Contact forces trajectory. I guess this is also included in the contact sequence object?
1. Torque trajectory for each joint.
1. dq(t), ddq(t)

pFernbach commented 5 years ago

Quick update given the suggestions so far. I propose to save 2 objects : The Serialized ContactSequence object, which will mainly be used to retry the same problem with different approach. And a single .npz Numpy archive. This archive will store he following data :

t
q(t)
dq(t)
ddq(t)
Tau(t)
c(t)
dc(t)
ddc(t)
L(t)
dL(t) All of this data will be numpy matrices (or vector for t) of the same size along one dimensions, storing the values at discretized time step (currently 1ms, is it OK for everyone ? ).
Contact force trajectories : a dict for each effector used during the motion ?
Effector trajectories : also one for each effector

Other data in the archive :

Phase time interval : the same size as the number of contact phase, for each contact phase give the interval of indices corresponding to this contact phase.
Status : a dict with various info about the planning status for this sample, like the infos.log file of the current database.

teguhSL commented 5 years ago

Thank you, the new format seems good to me. =)

One addition: is it possible for you to add another trajectory for the contact sequence? This will be a binary signal (0=no contact, 1 = contact) for each effector. I know that this can be extracted from the serialized contact sequence object, but so do the other trajectories (c(t), for example). Having this information already extracted will save some effort.

Effector trajectories : also one for each effector

This will be a trajectory in SE3?

teguhSL commented 5 years ago

I forgot to mention one thing about the dataset: during the Paris meeting Nicolas told us that the motion in this circle dataset is going to be used as a kind of building blocks for larger motions. With that in mind, the motion that we have in TALOS circle is too long to be considered as a good building block (since a motion in the dataset consists of more than 6 footsteps). There are two things that we might need to change:

Reduce the circle's radius, such that each motion will consists of only 2-4 footsteps.
Instead of setting the problem as : go from xy_1 to xy_2 ( as in the current dataset), we should set the problem as: go from foot-configuration 1 to foot-configuration 2. This will allow the motion to be a better building blocks that can be used for more complex locomotion (go around the stairs, for example). When we look at the problem of climbing the stairs, the initial and goal foot configuration also can be at different z location (according to the stair's height). And if we aim to use the hand for contact, then the problem in the dataset should be set as: go from contact-configuration 1 to contact-configuration 2 (each configuration consists of the feet and hands contact info, and to move between the two can involve more than one contact change).

We discussed about having a meeting to talk about the dataset generation, so maybe it is better to discuss this there, but I just mention here to let you know about our discussion.

pFernbach commented 5 years ago

Here is a draft of the struct which store all the data required. The goal is to create a .npz numpy archive which will contain all the fields of this struct :

        self.t_t = np.array([t_begin + i*cfg.IK_dt for i in range(N)])
        self.q_t = np.matrix(np.zeros([self.nq,N]))
        self.dq_t = np.matrix(np.zeros([self.nv,N]))
        self.ddq_t = np.matrix(np.zeros([self.nv,N]))
        self.tau_t = np.matrix(np.zeros([self.nv-6,N]))
        self.c_t = np.matrix(np.zeros([3,N]))  
        self.dc_t = np.matrix(np.zeros([3,N]))
        self.ddc_t = np.matrix(np.zeros([3,N]))
        self.L_t = np.matrix(np.zeros([3,N]))
        self.dL_t = np.matrix(np.zeros([3,N]))
        self.c_tracking_error = np.matrix(np.zeros([3,N]))
        self.c_reference = np.matrix(np.zeros([3,N]))
        self.wrench_t = np.matrix(np.zeros([6,N]))
        self.zmp_t = np.matrix(np.zeros([6,N]))
        self.contact_forces = {}
        self.contact_normal_force={}
        self.effector_trajectories = {}
        self.effector_references = {}
        self.effector_tracking_error = {}
        self.contact_activity = {}
        for ee in self.eeNames : # for all effectors used in the motion
            self.contact_forces.update({ee:np.matrix(np.zeros([12,N]))}) # 3D forces at the 4 corner of the rectangular contact
            self.contact_normal_force.update({ee:np.matrix(np.zeros([1,N]))})              
            self.effector_trajectories.update({ee:np.matrix(np.zeros([12,N]))}) # in SE(3)
            self.effector_references.update({ee:np.matrix(np.zeros([12,N]))})  # in SE(3)
            self.effector_tracking_error.update({ee:np.matrix(np.zeros([6,N]))}) # in SE(3)
            self.contact_activity.update({ee:np.matrix(np.zeros([1,N]))})
        self.phases_intervals = self.buildPhasesIntervals(cs)

N is the number of points, nq the configuration size, nv the number of degree of freedom (including the freeflyer base).

Tell me if you want to add anything.

pFernbach commented 5 years ago

You can find below an example of .npz archive produced by our new version : https://cloud.laas.fr/index.php/s/Yfz4iw22qGyhqJy Along with the ContactSequence object (now in binary) : https://cloud.laas.fr/index.php/s/dftImFtoklb5ryi

I wrote an helper script to load the .npz archive and build a more convenient struct: https://github.com/loco-3d/multicontact-locomotion-planning/blob/master/scripts/mlp/utils/wholebody_result.py

This new file is quite large, around 1.7Mo per second of motion. Indeed it store ~280 float per timestep, and we choosed a timestep of 1ms. If the size is an issue, we can easily increase the time step but it may be harder to exploit.

Some of the data in this struct are redundant (either with other data in the struct or with data in the ContactSequence object) in the sens that they can be easily recomputed from other data already available. However, according to our previous discussions I believe that the convenience to have all this data directly accessible is more important than the size of the file.

Another option is to compress the file (for example with xz) it take ~ 80% of the original size but it's a lot less convenient to use as the decompression is quite long.

I'd like to have your opinion, either on the choice of the timestep or if there is an hard constraint on the size of the dataset ?

wxmerkt commented 5 years ago

self.tau_t = np.matrix(np.zeros([self.nv-6,N]))

I wonder whether this should be of size self.nu x N-1 - in literature we often have state trajectories of length N and control trajectories of length N-1. Also, to be more general we could introduce nu (here, nu=nv-6), the control vector as how it's given it refers to a floating-base robot - when we could consider other actuation types in a more general framework.

pFernbach commented 5 years ago

Thanks for your remark. I added an optional parameter nu (with default value nv - 6).

As for why it's N and not N-1 : it is because the last control is computed anyway from the error between the last reached configuration and the last points in the references. You are right that this control is never applied to the robot, but as it's already computed I thought it was more convenient to store it anyway and have all the matrices with the same length.

pFernbach commented 5 years ago

See the updated README : https://github.com/MeMory-of-MOtion/docker-loco3d#generated-data

I close this issue, feel free to open a new issue for remarks on the current file format.

MeMory-of-MOtion / docker-loco3d