Datafile set definition

aremazeilles commented 3 years ago

Let´s start the description.

First I would propose to bring all the dataset description into a specific file, to focus on that topic. Reference to the data format (and the other way round) should be done, and part of the description in data format may be moved in that new file, or deleted.

Some names changes open to discussion:

subject data: we use right now subject_i_anthropometry.yaml. I propose to use a more generic term subject_i_info.yaml as the information stored is not always related to measures (age, gender, ...)

testbed condition setting: we used testbed.yaml. But when we construct the contextualized datafile name, we use pattern _cond_, and we frequently use condition settings in the text. Also, the parameters set in the file are not always only related to the testbed. So I propose condition_i.yaml.

Robot info: to be consistent with the subject, I propose robot_info.yaml.

I introduced the notion of contextualized filename, when the filename itself is sufficient to deduce all the context of that file. Then effectively, if contextualized filenames are used, we just do nto care about the folder structure used, looking for all files in it would be enough.

I was thinking of a case in which the folder names could be used to provide the complete context of a file, but this is actually tricky. At the very end of the introduced file, I go back to the first example, but I get blocked. You can imaging a folder structure like: subject->condition:

- subject_1: 
  - cond_1:
    - run_1,2,3,4_[type].csv
  - cond_2:
    - run_1,2,3,4_[type].csv
  - cond_3:
    - run_1,2,3,4_[type].csv
  - info.yaml
- subject_2:
  # similar
- subject_3:
  # similar
- subject_4:
  # similar
- condition_1.yaml
- condition_2.yaml
- condition_3.yaml
- robot_info.yaml

With the second use case, that would be give:

- subject_1: 
  - cond_1:
    - [type].csv
  - cond_2:
    - [type].csv
  - cond_3:
    - [type].csv
  - info.yaml
  - condition_{1,2,3}.yaml
- subject_2:
  # similar
- robot_info.yaml

What do you think?

Enri2077 commented 3 years ago

If I understand correctly, the robot info is not needed for madrob/beast, since it refers to the testbed, right? in our case the humanoid robot is the subject.

What we currently do is generate the pre-processed data files with filename subject_X_run_Z_[type].csv and the testbed file with filename madrob_testbed.yaml. We generate the preprocessed files and a testbed yaml file for each run. The testbed file contains all informations on the condition, the subject number, the run number and other information needed to recompute the preprocessed data from the raw data (rosbag). These files are saved in a folder with name MADROB_X_Z_T, where T is a timestamp to make sure its name is reasonably unique. We do this to avoid overwriting data from previous runs, for example by forgetting to change the subject/run number in the gui, but also to keep the results from different runs more organised.

Best solution for us: Saving the preprocessed data with format subject_X_cond_Y_run_Z_[type].csv (adding the condition number) would not be too much of a problem. What I think I'll do is make a table of conditions (with every combination of controlled variable) and the respective condition number, so that the operator selects each controlled variable and the testbed software automatically selects the appropriate condition number for the run.

The testbed yaml file can be renamed to fit the new format, but as it is currently used, it also contains information specific to the individual run (the name of the rosbag containing the raw data and the start time of the experiment, needed to correctly recompute the events.csv preprocessed data). The best solution for us would be to split the condition information and the run information into two files: condition_Y.yaml (only one for each condition, valid for all runs), and subject_X_cond_Y_run_Z_madrob_run_info.yaml (one for each run, like our current testbed.yaml)

Note that the run information still needs to be kept around because it contains information needed to recompute the preprocessed data from the rosbags.

Lastly, I would still save the generated files in a different folder for each run, but with the contextualised filenames the operator could simply upload the content of each run folder without the need to rename any file.

aremazeilles commented 3 years ago

If I understand correctly, the robot info is not needed for madrob/beast, since it refers to the testbed, right? in our case the humanoid robot is the subject.

Similarly as people may need info about the subject, robot.{urdf, yaml}' could be use to provide info on the robot tested. So it is not about the the testbed but about the Wearable or humanoid.

So if I am correct, what you suggest in your case would be to have:

subject_X_condition_Y_run_Z_[type].csv
condition_Y.yaml, common across subjects
subject_X_condition_Y_run_Z_madrob_run_info.yaml per run

From what you wrote, I understand that the last file is not used for the PI computation, but intended to permit reprocessing the bag file, is that correct?

Enri2077 commented 3 years ago

From what you wrote, I understand that the last file is not used for the PI computation, but intended to permit reprocessing the bag file, is that correct?

Exactly. To be honest it is only based on what I remember and a quick look into the code of the PIs, but I'm fairly sure I can decouple run info and conditions, and that run info is totally unrelated to the PI computation.

aremazeilles commented 3 years ago

if something like this can be done, that would be great, because in that case, the naming of this run_info file is less criticial as it is not used for data processing

alfonsotecnalia commented 3 years ago

subject data: we use right now subject_i_anthropometry.yaml. I propose to use a more generic term subject_i_info.yaml as the information stored is not always related to measures (age, gender, ...)

OK

Robot info: to be consistent with the subject, I propose robot_info.yaml

OK

Regarding the condition, I accept the change in naming to "condition_Y.yaml". I find interesting the decoupling of info in two: the one relative to the condition and the one relative to the run. We should mention/add it to the documentation.

Regarding the structure, I think contextualized filenames with unstructured folders are more flexible so I prefer that solution.

aremazeilles commented 3 years ago

I think we kind of agreed, is that right?

aremazeilles / eurobench_documentation

Datafile set definition #80