Annanilsson-code / ProjectAppliedMolBiophys

Project in the course Applied Molecular Biophysics 2020
1 stars 1 forks source link

Group by color in scatter plot #8

Closed Annanilsson-code closed 3 years ago

Annanilsson-code commented 3 years ago

https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/65a17f71a5e46bbd1307fd7a4dc7c808668d5c5a/results_analysis.py#L21

image

Is this the correct way to color by group? Because here I don't manage to separate the datasets. The "groups" are physically separated but mixed by color...

FilipeMaia commented 3 years ago

I would write out a shape vector in cc_calculation.py which you could use for the plotting.

Annanilsson-code commented 3 years ago

Ok, is it the resulting vectors from the MDS method that should be in the shape vector? Should I import them into cc_calculation? And should it be something similar to this: vector = [x,y] color = ("red", "blue")*vector c=color

FilipeMaia commented 3 years ago

There should be a vector to indicate if the image corresponds to a circle of a square. For example if you have 2 circles followed by two squares, the vector could be something like:

classes = [0,0,1,1] You can then use that classes vector as the c parameter to the scatter function. You can also have a single call to the scatter function.

Annanilsson-code commented 3 years ago

I tried this now and updated results_analysis.py. I put the classes vector in results_analysis because I didn't understand how to access the vector from cc_calculation. Now, if I have n=2 images in cc_calculation, I will get a classes vector with size=4. But x and y are going to be of length 10 in results_analysis, and to be able to use that in the scatter plot, the c argument must be of size 10 as well. Now I just changed to n=5 in results_analysis... But that doesn't seem correct. I am a bit confused image

FilipeMaia commented 3 years ago

This should be: plt.scatter(x,y, c=classes, s=scale)

Annanilsson-code commented 3 years ago

How do I specify the colors then?

FilipeMaia commented 3 years ago

Just let matplotlib handle the colors automatically. You just specify the values.

Annanilsson-code commented 3 years ago

Okay, do you think that the code is correct now?

FilipeMaia commented 3 years ago

No, the code here https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/086a59e4f5c62a7f8399ea3dce546ae990c80786/cc_calculation.py#L72 does not do what you want. You should put all images in one matrix and call corrcoef on it.

Annanilsson-code commented 3 years ago

Now I used numpy concatenate to put all images in one matrix directly after importing them - is that correct?

YlvaJansson commented 3 years ago

Why does it matter if they're in separate matrices?

https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html

numpy.corrcoef(x, y=None, rowvar=True, bias=, ddof=)

Parameters x : array_like A 1-D or 2-D array containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables. Also see rowvar below.

y : array_like, optional An additional set of variables and observations. y has the same shape as x.

FilipeMaia commented 3 years ago

You're right, it looks like it doesn't matter. But it makes the code less clear as it looks like you're just correlating reshaped1 with reshaped2.

FilipeMaia commented 3 years ago

BTW the code still looks wrong https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/7cde62527adcca41da77604f505f1950df168ebf/cc_calculation.py#L43-L67

I suggest the following https://gist.github.com/FilipeMaia/8ead6f8289ae76229c3c31620cc8477a

Annanilsson-code commented 3 years ago

Thank you. Now I get this result with n=100 and no noise changed: image

With n=100 and noise*0.5 this is the result: image

FilipeMaia commented 3 years ago

The problem is that https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/18ada5ef01430992db589106215e3d68db14b45b/results_analysis.py#L15 does not do what you want.

What it does is run all the code in cc_calculation and gets the classes variable from it, but that will have different values than the last time it run. Instead what you want is load the classes vector that was saved to file:

classes = np.load('classes.npy')
Annanilsson-code commented 3 years ago

Aha! Okay