dzhvansky / pepato

GNU General Public License v3.0
1 stars 2 forks source link

pi output type #8

Closed alfonsotecnalia closed 3 years ago

alfonsotecnalia commented 4 years ago

At this moment, we are considering the following output types: scalar, vector, matrix, labelled_matrix and string.

We see in the output file that you are considering output types such as vector_of_vector and vector_of_matricies. Could they be considered as a matrices (type matrix)? Would there be any reason to add vector_of_vector/vector_of_matrices types to the list of accepted output types?

dzhvansky commented 4 years ago

There is a reason for 5 parameters: 'pattern_fwhm', 'pattern_coa', 'patterns_similarity', 'synergies_similarity', 'matching_standard_reference_index'. For these parameters, we can get a different number of synergies at different speeds (in the case of using 'BLF' or 'R2=0.90' as NMF stop criteria). For example, we can get [[96.3, 9.8, 41.3], [93.2, 3.4, 36.4, 49.0, 90.4], [93.4, 6.6, 21.4, 38.6]] for 'pattern_coa' parameter. For the rest of the parameters, we have reduced the types to the standard ones.

alfonsotecnalia commented 4 years ago

You mean that the parameters which result is of type vector_of_vector could have variable number of vectors of different sizes, don't you?

dzhvansky commented 4 years ago

The number of vector elements will always be 3 because we have 3 different speeds in this setup. But the size of each element of the vector can be different (minimum 3, maximum ~ 6 depending on the number of synergies calculated).

alfonsotecnalia commented 4 years ago

Ok, then I think we have to extend the types to accomodate vector_of_vector and vector_of_matrices. What do you think @aremazeilles?

aremazeilles commented 4 years ago

No issue with that. I am only wondering whether we should add for labels for the lines / columns, to enrich (potentially) the visualization and comparison of such results

aremazeilles commented 3 years ago

@dzhvansky

We have updated the PI output format: see the doc

In your case, the following changes would be needed:

patterns_similarity.yaml : change vector_of_matricies to vector_of_matrix. Right now your format is:

type: vector_of_matricies
value: [[[0.46952, -0.61304, 0.87954, 0.64039], [0.93869, -0.28111, 0.38969, 0.88667], [-0.25053, 0.95214, -0.70183, -0.51035], [0.10085, 0.54473, -0.6915, -0.27004], [0.1191, -0.45409, 0.77165, 0.34175]], [[0.68391, -0.4158, 0.82883, 0.024115], [0.8097, -0.38895, 0.34239, 0.64038], [-0.24222, 0.95568, -0.61256, -0.46461], [-0.3614, 0.50409, -0.43918, -0.44096], [-0.056503, -0.43976, 0.46824, 0.71175]], [[0.96272, -0.19, 0.39977, 0.1049], [-0.094108, 0.47302, -0.30215, 0.16371], [-0.29254, 0.96314, -0.4772, -0.37951], [0.38344, -0.38879, 0.26328, 0.96102], [0.69427, -0.43879, 0.94155, 0.12186]]]

We would suggest shifting it to:

type: vector_of_matrix
value:
  - value: [[0.46952, -0.61304, 0.87954, 0.64039], [0.93869, -0.28111, 0.38969, 0.88667], [-0.25053, 0.95214, -0.70183, -0.51035], [0.10085, 0.54473, -0.6915, -0.27004], [0.1191, -0.45409, 0.77165, 0.34175]]
  - value: [[0.68391, -0.4158, 0.82883, 0.024115], [0.8097, -0.38895, 0.34239, 0.64038], [-0.24222, 0.95568, -0.61256, -0.46461], [-0.3614, 0.50409, -0.43918, -0.44096], [-0.056503, -0.43976, 0.46824, 0.71175]]
  - value: [[0.96272, -0.19, 0.39977, 0.1049], [-0.094108, 0.47302, -0.30215, 0.16371], [-0.29254, 0.96314, -0.4772, -0.37951], [0.38344, -0.38879, 0.26328, 0.96102], [0.69427, -0.43879, 0.94155, 0.12186]]

Meaning in matrix of the vector is placed on a single line.

Same change would be needed for the synergies_similarity.

Then, for all format, we give the opportunity to define labels per row, cols, etc. If it is relevant for you, please check the format in the doc.

If you need help with this update, please let me know.

aremazeilles commented 3 years ago

ping

dzhvansky commented 3 years ago

Prepared an update for data types. Several questions also popped up: 1) Does the doc indicate the structure for type vector_of_vector correctly? Should the key be values or value like for other pi types? 2) For vector_of_vector and vector_of_matrix data types, I suggest adding a high-level label to tag the main vector:

type: 'vector_of_vector' label: [vec_label_1, vec_label_2] values:

aremazeilles commented 3 years ago

You are right,, it should be value and not values. Your proposition for adding a label item sounds good to me as well.

This is unclear where there is 3 spaces and not 2. Could you clarify this?

dzhvansky commented 3 years ago

This is unclear where there is 3 spaces and not 2. Could you clarify this?

I mean there is one extra space before the hyphen in - label keys: image

What I suggest (common yaml format): image

And the last question is whether quotes are needed for string variables in our output yaml format?

aremazeilles commented 3 years ago

In this documentation, they are suggesting to add an indentation.

Actually I am using in the cI process yamllint to verify all yaml file in the repository. If I launch it with your suggestion:

type: 'vector_of_vector'
values:
- label: [label_1, label_2]
  value: [val_1, val_2]
- label: [label_1, label_2, label_3]
  value: [val_1, val_2, val_3]

It states:

╭─anthony@PRT0962AL ~/tmo 
╰─$ yamllint pi.yaml
pi.yaml
  1:1       warning  missing document start "---"  (document-start)
  3:1       error    wrong indentation: expected 2 but found 0  (indentation)

Note that under python, loading of both versions works, and it is probable that both version works. But as I use yamlint I would be more in favour in having an indentation, as we suggest. Actually there might be a differentiation whether the list is at the root of the yaml file (in that case, no indentation) or whether the list is associated to a key (which is our case).

And the last question is whether quotes are needed for string variables in our output yaml format?

I would say that this is not mandatory

aremazeilles commented 3 years ago

Actually to respect the yamllint indication, we should even go for an indentation of two spaces:

type: 'vector_of_vector'
value:
  - label: [label_1, label_2]
    value: [val_1, val_2]
  - label: [label_1, label_2, label_3]
    value: [val_1, val_2, val_3]

But the tool seems to accept indentation of one space, so I think we are flexible in that, whether you use 1 or 2 space indentation

dzhvansky commented 3 years ago

OK, thank you! I indent 2 spaces and updated the output files. Please check the reproducibility on linux machine and replace output files if necessary.

aremazeilles commented 3 years ago

The result I get now is slightly differing from the reference output data. I attach the files that get generated. Can you confirm these variations are minor to you (so that I can place them as reference output data?)

output.zip

I guess this is why the CI process fails right now

aremazeilles commented 3 years ago

Appart from that, I think the output format are correct

dzhvansky commented 3 years ago

Yes, I confirm the differences are minor, so you can replace the reference output.

aremazeilles commented 3 years ago

Reference updated.