ShobiStassen / PARC

MIT License
41 stars 11 forks source link

Question on pruning big cluster #17

Closed ivan-marroquin closed 2 years ago

ivan-marroquin commented 2 years ago

Hi @ShobiStassen ,

Thanks for making available your program!

I am a bit confused regarding the following, and I would like to ask for your clarifications.

When a crs_graph is built with option for local pruning, you use this command: weight_list.append(1/ (dist + 0.1))

I see that for two close neighbors, their distance is small and their weight is big. On the other hand, for two far neighbors, their distance is high and their weight is small.

Then, in the function to analyze a big cluster, you use this pruning: mask |= (csr_array.data > ( np.mean(csr_array.data) + np.std(csr_array.data) * 5)) # smaller distance means stronger edge csr_arrray[mask]= 0 csr_array.eliminate_zeros()

It seems to me that the above pruning will cause relative big weights to be remove. Thus, very close neighbors are eliminated. Is this the intended goal?

Kind regards,

Ivan

ShobiStassen commented 2 years ago

Hi Ivan, apologies for the extremely late reply and thank you for pointing this out. Because the *5 std factor is so far above the mean, this threshold has no impact on the actual pruning (which is instead done in the subsequent 10 lines or so of code). But you are right, the code would have made more sense had it read:

mask |= (csr_array.data < (np.mean(csr_array.data) - np.std(csr_array.data) * 5)) because the weights indeed 1/dist which means closer nodes will have larger weights.

The original line of code that you pointed out, is actually from an older code fragment when I was testing thresholds. It would have almost no impact on the pruning because there would be virtually no edges that would have been above mean + 5*std, and hence no very close strong edges would have been removed. That being said, I have removed these 3 lines of code to avoid confusion! Thank you for pointing this out Shobi

ivan-marroquin commented 2 years ago

Hi @ShobiStassen

Many thanks for the clarification. I am going to check the new version of your program.

Ivan