Annanilsson-code / ProjectAppliedMolBiophys

Project in the course Applied Molecular Biophysics 2020
1 stars 1 forks source link

Classes #10

Closed Annanilsson-code closed 3 years ago

Annanilsson-code commented 3 years ago

https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/7cde62527adcca41da77604f505f1950df168ebf/cc_calculation.py#L81

Now we know that classes should be a shape vector in cc_calculation.py that indicates if the image is a circle or a square. In image_matrix of size (300,1000) I have put the circle in the first 500 columns and the square in the second 500 columns. Should classes be connected to image_matrix, and why, if so? If not image_matrix, what variable should we connect it to/base the vector's shape on?

Because I thought that classes should be connected to the resulting vectors from the MDS. But then I cannot know beforehand which vector corresponds to which image.

YlvaJansson commented 3 years ago

It's more to see if MDS succeeds to correctly separate them right? If we label the data we put in then it will be visible if the output clusters actually are what we believe they are (circles/squares), and then we can change input/parameters etc and see how the algorithm holds up. Which would mean that the results for unlabelled data afterwords would be more trustworthy

Annanilsson-code commented 3 years ago

I suppose so but Filipe said that "classes" should correspond to how the images are ordered. If we have two squares in a row followed by two rectangles, the vector should be (0,0,1,1). But how do we label the data with the vector named "classes"? And what do you mean by unlabelled data? Because now we have to label the data in order to color the points in the scatter plot correctly

FilipeMaia commented 3 years ago

With unlabelled data we won't be able to color of course. But for all this project we'll always be working with labelled data. BTW the size of the image_matrix should be (1000,300) not (300,1000).