Add vector_of_vector and vector_of_matrices formats

alfonsotecnalia commented 3 years ago

For results which contain several vector/matrices of different sizes. We have to consider whether to add labels and how.

aremazeilles commented 3 years ago

We would need to gather the cases where this is needed.

Regarding the label, could we connect it to #44?

alfonsotecnalia commented 3 years ago

vector_of_vector and vector_of_matrices are proposed in https://github.com/dzhvansky/pepato/issues/8.

Here it is a proposal for the types:

type: vector
value: [vector] # vector elements separated by ","

type: labelled_vector
value: [[labels_vector]; [vector]]

type: matrix
value: [[row_1_vector]; [row_2_vector]; ...;[row_N_vector]] # rows separated by ";"

type: labelled_matrix
value: [[row_labels_vector], [[column_labels_vector]; [row_1_vector]; [row_2_vector]; ...;[row_N_vector]]]

type: vector_of_vector
value: [[vector_1], [vector_2], ..., [vector_N]]

type: labelled_vector_of_vector
value:[[labels_vector],[[vector_1_labels_vector];[vector_1]],[[vector_2_labels_vector];[vector_2]], ..., [[vector_N_column_labels_separated_by_commas];[vector_N_elements_separated_by_commas]]]

type: vector_of_matrices
value: [matrix_1, matrix_2, ..., matrix_N] #being matrix_Y an element of the form [[row_1_vector]; [row_2_vector]; ...;[row_N_vector]]

type: labelled_vector_of_matrices
value: [[matrices_labels_vector], [[matrix_1_labels_vector],[[matrix_1_column_labels_vector];matrix_1]], [[matrix_2_labels_vector], [[matrix_2_column_labels_vector];matrix_2]] , ..., [[matrix_N_labels_vector] ,[[matrix_1_column_labels_vector];matrix_N]]]

aremazeilles commented 3 years ago

For vector and labelled_vector, could we get it with an unique type, vector, and another (optional) key, label:

type: vector
label: [label_1, label_2, label_3]
value: [val_1, val_2, val_3]

For matrices, is it standard in yaml to ; instead of ,? Commas might be better. Also, it could be right as suggested here

Similarly I would rather handle labelled_matrix by adding more keys to state row_label and column_label:

type: matrix
row_labels: [row_1, row_2, row_3]
col_labels: [col_1, col_2, col_3]
value: [[1, 2, 3], [4, 5, 6], [7,8,9]]

For labelled Vector of vector, could we use more yaml layers:

type: vector_of_vector
values:
 - labels: [label_1, label_2]
   values: [val_1, val_2]
 - labels[label_1, label_2, label_3]
   values: [val_1, val_2, val_3]

This may not be the exact way of doing so in yaml format, but I think the appropriate way is not that far.

For the vector of matrices, I would also split each matrix using the -, and also split the the labelled case.

Depending on if this concept is agreed, we can define it for each case

alfonsotecnalia commented 3 years ago

I was thinking in not extending the number of labels in the yaml. The proposed format seems clearer. However I would say vector, labelled_vector, matrix and labelled_matrix could be only a type:

type: matrix
cols: 
rows: #1 for vectors
row_labels: [] #optional
column_labels: [] #optional
value: [rows*columns elements separated by commas]

aremazeilles commented 3 years ago

I was thinking in not extending the number of labels in the yaml.

Not sure to understand your statement.

The format you suggest is in line with the OpenCV format for general matrices (like here, if you scroll to the end). However, I am not sure we gain anything in processing vector and matrices the same way. For visualization puposes, it would be needed to scroll the file, to see whether the matrix is a "real matrix", or just a vector. So why don't set a type for it directly?

alfonsotecnalia commented 3 years ago

I was thinking in not extending the number of labels in the yaml.

Not sure to understand your statement.

The first proposal do not uses other labels appart from type/value. But as I have said, adding more labels makes it clearer

However, I am not sure we gain anything in processing vector and matrices the same way.

Same structure for storing the info in a program
Adding cols/rows numbers allows for storing efficiently the data.

But let's see what other people think

aremazeilles commented 3 years ago

ping @mjperezzurera

mjperezzurera commented 3 years ago

I have read the entire post and I think I have not enough info/context to decide whether is better or worse for the output of the pi file, in fact I didn't know there was an output file of the pi execution in yml, so I prefer you to decide which solution is best for you and then adapt my code if needed.

Thank you anyway for count me in.

aremazeilles commented 3 years ago

Resuming the topic. My suggestion would be.

#vector without label can remove the label tag
type:vector
label: [lab1, lab2,..., labn]
value: [ val1, val2, ..., valn]
# assumption (for the visualization): if no label, then the vector is likely to be a several measures of the same aspect, like all step_length 
# if label use, then each entry can be different

#matrices 
# row and cols label are optional
type: matrix
row_labels: [row_1, row_2, row_3]
col_labels: [col_1, col_2, col_3]
value: [[1, 2, 3], [4, 5, 6], [7,8,9]]
# for the visualization, the label can be used to display the matrix as a table

# vector of vector
# labels are again optional
type: vector_of_vector
values:
 - labels: [label_1, label_2]
   values: [val_1, val_2]
 - labels: [label_1, label_2, label_3]
   values: [val_1, val_2, val_3]
# for the visualization, label can be used, as it if was a collection of labelled vectors

# vector of matrices:

type: vector_of_matrix
values:
 - row_labels: [row_1, row_2, row_3]
   col_labels: [col_1, col_2, col_3]
   value: [[1, 2, 3], [4, 5, 6], [7,8,9]]
 - row_labels: [row_1, row_2, row_3]
   col_labels: [col_1, col_2, col_3]
   value: [[1, 2, 3], [4, 5, 6], [7,8,9]]
# for the visualization, no clue

In my opinion, using same format for vector and matrices may be more compact, but I am not sure we gain anything more than having a representation more compact. This is more human readable to me with different format.

The following question is how to conduct the aggregation in each of these cases. Maybe if we distinguish the format it can be easier to handle?

alfonsotecnalia commented 3 years ago

Yes, I think so. I imagine the specification of aggregation as a 3 component index (vector_number, row_number, column_number) together with the list of operations to be done. Depending of the format, the elements taking part in the operations will be different so it would be good to differentiate them. Example: 0,1,0: mean, std:

matrix: apply mean and std to the elements in the first row
vector_of_matrices: apply mean and std to the elements in the first row of all matrices

aremazeilles commented 3 years ago

Alfonso, is it ok if I update the documentation in that direction? Then we could also detail for each of the format how the aggregation work.

alfonsotecnalia commented 3 years ago

Ok

aremazeilles / eurobench_documentation

Add vector_of_vector and vector_of_matrices formats #62