Hi, I'm in trouble with statsample to do PCA analysis for large data. Does anyone have any good idea?
I want to do PCA alanysis with very large data. (3000 variables, 50 samples)
Then, I wrote this code.
data_raw = IO.readlines('data1.txt').map{|v| v.split }[1..-1]
hash_tmp = {}
data_raw[1..3000].each do |ary|
hash_tmp[ary[0]] = ary[1..-1].map(&:to_i).to_scale
end
ds = hash_tmp.to_dataset
puts "Input data done!"
cor_matrix=Statsample::Bivariate.correlation_matrix(ds)
puts "cor_matrix was prepared."
pca=Statsample::Factor::PCA.new(cor_matrix)
binding.pry
But the ruby on my mac doesn't return "Cor_matrix was prepared.".
I wrote another code to investigate a cause of this.
# Opening Class to investigate where is bottleneck
module Statsample
module Bivariate
class << self
def covariance_matrix_optimized(ds)
x=ds.to_gsl
n=x.row_size
m=x.column_size
puts "calculating means..."
means=((1/n.to_f)*GSL::Matrix.ones(1,n)*x).row(0)
puts "centering matrix..."
centered=x-(GSL::Matrix.ones(n,m)*GSL::Matrix.diag(means))
puts "calculating covariance matrix..."
ss=centered.transpose*centered
puts "calculating n..."
s=((1/(n-1).to_f))*ss
puts "done!" #<= This line has executed
s
end
def correlation_matrix(ds)
vars,cases=ds.fields.size,ds.cases
if !ds.has_missing_data? and Statsample.has_gsl? and prediction_optimized(vars,cases) < prediction_pairwise(vars,cases)
binding.pry
cm=correlation_matrix_optimized(ds)
binding.pry #<= This line hasn't executed. :(
else
cm=correlation_matrix_pairwise(ds)
end
binding.pry
cm.extend(Statsample::CovariateMatrix)
binding.pry
cm.fields=ds.fields
binding.pry
cm
end
end
end
end
Then the Ruby return until "done!" and doesn't return from Statsample::Bivariate#covariance_matrix_optimized method.
I haven't seen a Ruby method which doesn't return.
If someone knows a way to solve this problem or investigate cause deeply, please tell me.
Hi, I'm in trouble with statsample to do PCA analysis for large data. Does anyone have any good idea?
I want to do PCA alanysis with very large data. (3000 variables, 50 samples) Then, I wrote this code.
But the ruby on my mac doesn't return "Cor_matrix was prepared.". I wrote another code to investigate a cause of this.
Then the Ruby return until "done!" and doesn't return from Statsample::Bivariate#covariance_matrix_optimized method. I haven't seen a Ruby method which doesn't return.
If someone knows a way to solve this problem or investigate cause deeply, please tell me.