Closed borondics closed 4 weeks ago
Also add the preprocessor to test_preprocess.PREPROCESSORS_INDEPENDENT_SAMPLES
for torture tests.
Also, how did you ensure that the code in transform
does what you would want? If you did any comparisons with existing implementations, make a test that encompases that comparison.
Thanks for the suggestions!
I did both.
However, should we also add PCADenoising to test_preprocess.PREPROCESSORS_INDEPENDENT_SAMPLES
? It wasn't there.
For the test I used a small script that was implemented by colleagues to get the calculated values for iris
and took the numbers. I will also verify this for a spectral dataset.
Some questions: PyLint complains about the MNFDenoising.sd, which seems to be unnecessary but it is also in PCADenoising... Should we remove both?
Ha! The new tests didn't pass. :D
Sorry, I was wrong, these is not an independent samples preprocessor, just like PCA is not, I should have said PREPROCESSORS_GROUPS_OF_SAMPLES
Moved it but there is still some problem. Possibly not only with the new addition...
@borondics The peakfit failures are due to #720 which should be resolved as soon as lmfit 1.3.2 is released.
@borondics, please rebase to master so that tests will (mostly) work.
The MNF related tests are passing on my computer but I can do this various ways. @markotoplak, right now I am using a try and raising errors while still returning the original data if the MNF fails. What do you say?
OK, I think now the MNF tests are also passing here.
@borondics, some tests fail. Your code can not handle unknowns. You probably need to extend CommonDomainOrderUnknowns
instead of CommonDomain
and that is it then. This was not needed for PCA because Orange's PCA handles them.
But if I think about it, even PCA denoising should use that one, because the interpolation it uses is better for spectra.
Another test fails because results of this preprocessor are so unstable that small changes in data produce very different values. Is that expected? If so, we could ignore it in that test.
BTW, this is a sign of high instability of the method and makes its usability questionable.
That "testing for nans" commit is probably unnecessary if you extend the correct class and anyway, should not be done so. If you ever need something like this, make try: except" cover the least code possible.
@borondics, I rebased and made this robust to unknown values as I pointed out above.
Now only one tests is failing.
I also find the original (and current) computation code suspicious. When computing the N matrix all differences were computed and then there was additional vector of zeros. This seems like a bug, and your tests also pass if I say N = diffs
. How did you verify if the code is fine?
I have some additional suspicions due to the non-passing test. :)
@borondics, I am merging this assuming that the failed test is not problematic (I just ignored it).
Please, consider again if the computation is correct. The sensitivity of the method to small perturbations in the data (the sensitivity is problematic even on bigger data, like the whole collagen).
Even if the implementation is correct, I would not use the method on data like collagen because of its sensitivity to noise (if the method is meant to be applied to different kinds of data - perhaps already preprocessed in a certain way - it should be, of course, tested with that).
Added a new denoising method with the help of SOLARIS colleagues.
The paper it is based on is this.