SCIInstitute / dSpaceX

dSpaceX Visualization Library and Tool
5 stars 2 forks source link

Error computing M-S #200

Open cchriste opened 3 years ago

cchriste commented 3 years ago

Partially resolved with Ross, but M-S still contains erroneous results. Debug code added to NNMSComplex.h that prints min/max for each sample as partitions are consolidated. Actual errors have been identified and we can investigate this using nano-500 dataset. More details forthcoming, though we could dive into this immediately since debug code reveals errors.

This is the underlying reason display of extrema (#114) is still incorrect and therefore disabled, but even non-extrema samples are incorrect.

cchriste commented 3 years ago

Screenshots showing dreaded "Sample 28", including Hamming distance matrix 2d embedding clearly indicating how far from everything else this sample is. It was explicitly removed and other samples came out showing the same issue for different reasons.

Screen Shot 2020-12-09 at 8 22 41 PM Screen Shot 2020-12-09 at 8 23 59 PM Screen Shot 2020-12-09 at 8 24 04 PM Screen Shot 2020-12-09 at 8 24 31 PM Screen Shot 2020-12-09 at 11 38 29 PM
cchriste commented 3 years ago

Current dataproc branch prints M-S computation debug info after each mergePersistence call. Results match drawer.

The debugging output in NNMSComplex::mergePersistence is currently enabled at line 222 of NNMSComplex.h. Line 652 prints the same information at the end of NNMSComplex::runMS() just before the first mergePersistence(0) call and can be used to verify the results of the first merge are identical to the input data.

Using the dataset np500.2 (https://drive.google.com/file/d/1ljHrlaHR9C53uz35EIiN4Iz0l6dDoLOj/view?usp=sharing). Seems that at least the extrema are incorrect. For example, when down to four crystals, sample 527 has extrema 343, but all the sightings of that sample much earlier in the persistences show the same maxima.

Not sure and out of battery (managed to forget it, which I guess is good). Enjoy!