Open bobleesj opened 4 weeks ago
Ok, I'll do it and reply to you
Also, since this is an additive correction and not a multiplicative one, let's not call it a factor. We could use term
but it doesn't seem so clear in this context, or maybe quantity
or amount
or value
? so lift_amount
or lift_value
. I leave the decision up to you guys.
@bobleesj thanks for catching that. Was it not caught by a test though? that is not good.....it probably means that the tests were written by running the function, getting the result, then coding that into the test! ugh! Or maybe it is a case where we didn't have test. Not sure, I didn't look. In general folks, tests first, code after.....
I searched the literature a bit: I couldn't find anything equal but I did find this method which has about the same function as ours and could be useful because it normalises spectra (it is mainly used in spectroscopy), for us it would be patterns.
import numpy as np
def snv(data):
# Vectorized approach to SNV
mean = np.mean(data, axis=1, keepdims=True) # mean of each row
std = np.std(data, axis=1, keepdims=True) # std of each row
return (data - mean) / std
# Usage
normalized = snv(spectra) # spectra is a 2D array where each row is a spectrum
Other methods are for background subtraction but this is not the case here.
I think lift_value
is the best way name the term
Was it not caught by a test though? that is not good.....it probably means that the tests were written by running the function, getting the result, then coding that into the test! ugh!
I am afraid that's the case, especially given that all of the test functions do not use real data - all of the test inputs are dummy matrices, hard coded.
I searched the literature a bit: I couldn't find anything equal but I did find this method which has about the same function as ours and could be useful because it normalises spectra (it is mainly used in spectroscopy), for us it would be patterns.
import numpy as np def snv(data): # Vectorized approach to SNV mean = np.mean(data, axis=1, keepdims=True) # mean of each row std = np.std(data, axis=1, keepdims=True) # std of each row return (data - mean) / std # Usage normalized = snv(spectra) # spectra is a 2D array where each row is a spectrum
Other methods are for background subtraction but this is not the case here.
I think
lift_value
is the best way name the term
sorry, I am a bit confused. we don't want to include new functions/features since the goal is to design our code in a way that reproduces the plots and figures from matlab. The questions is, do we data_input + np.abs(np.min(data_input) * lift)
or data_input - np.abs(np.min(data_input) * lift)
? Maybe the matlab code has it? @Fil158
I searched the literature a bit: I couldn't find anything equal but I did find this method which has about the same function as ours and could be useful because it normalises spectra (it is mainly used in spectroscopy), for us it would be patterns.
import numpy as np def snv(data): # Vectorized approach to SNV mean = np.mean(data, axis=1, keepdims=True) # mean of each row std = np.std(data, axis=1, keepdims=True) # std of each row return (data - mean) / std # Usage normalized = snv(spectra) # spectra is a 2D array where each row is a spectrum
Other methods are for background subtraction but this is not the case here. I think
lift_value
is the best way name the termsorry, I am a bit confused. we don't want to include new functions/features since the goal is to design our code in a way that reproduces the plots and figures from matlab. The questions is, do we
data_input + np.abs(np.min(data_input) * lift)
ordata_input - np.abs(np.min(data_input) * lift)
? Maybe the matlab code has it? @Fil158
yes, sorry for the confusion. This is not a research/literature issue, it is checking what ran did and making sure that we coded it up correctly.
%% remove baseline (lift)
data_input = Data - min(Data(:));
This is the code in matlab that removes the baseline by subtracting the minimum value of the entire data matrix from each element. It returns all values shifted such that the minimum value is 0.
So it seems to be more similar to this data_input - np.abs(np.min(data_input) * lift)
but I'm afraid that subtraction could potentially cause more problems, so, maybe, it is better data_input + np.abs(np.min(data_input) * lift)
but I'm afraid that subtraction could potentially cause more problems
If we consider the matlab code as our source of truth, what other problems would it have?
I can think of a situation where you have 0s in the matrix and for some reason the code subtracts a minimum value calculated only on the strictly positive numbers and so you get negative values. It's probably my mindless paranoia....
@Fil158 Probably not? say, 1 - 1 * 0.1
is still positive? Assuming the data input is positive and the lift factor is less than 1 (you could double check this for me).
As I understand it, non-negative values are required to make NMF, not necessarily strictly positive and >1, so I think could work
Ok could you correct the function? It will prob break tests as expected.
Also have you looked into the overall algorithm implemented in matlap and described in the paper? Wondering about the current status now
Let's code to the matlab. I don't see a reason why we don't just subtract the minimum. Other considerations are (a) code readability. I think subtracting the minimum is clearest here. (b) code performance. Probably not an issue, but it is usually a bad idea to take absolute values, especialiy in regression problems, because they are not differentiable in general and introduce instabilities when using gradient methods. Ran is a professional mathametician who is expert in regression problems, and makes some choices that are obvious to him without telling us why....in general we should go with what he did unless we have a very good reason to change.
Problem
lift_data
is applied to input signals below:The following is the implementation of
lift_data
:While the the function is expected to return
data_input - (min(data_input) * lift)
the actual code is written asreturn data_input + np.abs(np.min(data_input) * lift)
Proposed solution
Double check literature (if there is one)