Closed Annanilsson-code closed 3 years ago
I would write out a shape vector in cc_calculation.py
which you could use for the plotting.
Ok, is it the resulting vectors from the MDS method that should be in the shape vector? Should I import them into cc_calculation? And should it be something similar to this: vector = [x,y] color = ("red", "blue")*vector c=color
There should be a vector to indicate if the image corresponds to a circle of a square. For example if you have 2 circles followed by two squares, the vector could be something like:
classes = [0,0,1,1]
You can then use that classes vector as the c
parameter to the scatter function.
You can also have a single call to the scatter function.
I tried this now and updated results_analysis.py. I put the classes vector in results_analysis because I didn't understand how to access the vector from cc_calculation. Now, if I have n=2 images in cc_calculation, I will get a classes vector with size=4. But x and y are going to be of length 10 in results_analysis, and to be able to use that in the scatter plot, the c argument must be of size 10 as well. Now I just changed to n=5 in results_analysis... But that doesn't seem correct. I am a bit confused
This should be:
plt.scatter(x,y, c=classes, s=scale)
How do I specify the colors then?
Just let matplotlib handle the colors automatically. You just specify the values.
Okay, do you think that the code is correct now?
No, the code here https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/086a59e4f5c62a7f8399ea3dce546ae990c80786/cc_calculation.py#L72 does not do what you want. You should put all images in one matrix and call corrcoef
on it.
Now I used numpy concatenate to put all images in one matrix directly after importing them - is that correct?
Why does it matter if they're in separate matrices?
https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html
numpy.corrcoef(x, y=None, rowvar=True, bias=
Parameters x : array_like A 1-D or 2-D array containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables. Also see rowvar below.
y : array_like, optional An additional set of variables and observations. y has the same shape as x.
You're right, it looks like it doesn't matter. But it makes the code less clear as it looks like you're just correlating reshaped1
with reshaped2
.
BTW the code still looks wrong https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/7cde62527adcca41da77604f505f1950df168ebf/cc_calculation.py#L43-L67
I suggest the following https://gist.github.com/FilipeMaia/8ead6f8289ae76229c3c31620cc8477a
Thank you. Now I get this result with n=100 and no noise changed:
With n=100 and noise*0.5 this is the result:
The problem is that https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/18ada5ef01430992db589106215e3d68db14b45b/results_analysis.py#L15 does not do what you want.
What it does is run all the code in cc_calculation
and gets the classes
variable from it, but that will have different values than the last time it run. Instead what you want is load the classes
vector that was saved to file:
classes = np.load('classes.npy')
Aha! Okay
https://github.com/Annanilsson-code/ProjectAppliedMolBiophys/blob/65a17f71a5e46bbd1307fd7a4dc7c808668d5c5a/results_analysis.py#L21
Is this the correct way to color by group? Because here I don't manage to separate the datasets. The "groups" are physically separated but mixed by color...