Closed TomHaoChang closed 7 years ago
To expand on what @TomHaoChang said:
If the ndims < data.shape[0]
, then the reduced data will have data.shape[0]
columns rather than the expected ndims
columns.
To maintain the expected shape of the data, we could pad the returned matrix with zeros to that it has the expected shape.
hmm, interesting..so pad the input data to the PCA function, or pad the PCA-reduced data?
In this rare case, we'd pad the output so that it has the correct number of dimensions.
To give some more background for this issue, @TomHaoChang and I are using hyp.tools.reduce
for another project, where the data reduction API for hypertools is a really convenient way to apply PCA to either a list of numpy arrays (using the group-level reduction model that we use for plotting multiple arrays) or a single numpy array, using the same syntax.
But for that project, unlike with the typical plotting setup that hypertools was primarily designed for, we frequently encounter the situation where the number of observations is less than the number of desired PCA dimensions (since we need to get to a pre-specified number of dimensions for our math to work out nicely).
This issue doesn't often show up when we're using hypertools for plotting, since the number of observations is nearly always greater than 3. But in this somewhat off-the-beaten-path use case we're not getting the correct number of dimensions from reduce
despite specifying ndims
.
👍 ill write a check to see if the number of columns returned by the PCA model is less than ndims
, and if so, fill with zeros
implemented on 477a548
import hypertools as hyp
import numpy as np
print hyp.tools.reduce(np.random.normal(0,1,[5,100]),ndims=10).shape
This is the code I tried which was supposed to give me a 5x10 dimension matrix. However, because the rank of the matrix is 5, PCA is unable to generate 10 dimensions from the data. Therefore the resulting matrix I got was a 5x5 matrix.
I talked about this issue with Professor Manning today, and he suggested a fix to this problem: if the number of dimensions to reduce to is greater than the rank of the matrix, then pad the matrix with rows of 0s to increase the rank, do PCA then eliminate the 0 rows.
Could you look into this issue and let me know what you think? Thanks!