avulanov / scalable-deeplearning

Scalable implementation of artificial neural networks for Spark deep learning
Apache License 2.0
38 stars 18 forks source link

Encoded data all the same #7

Open exrezzo opened 4 years ago

exrezzo commented 4 years ago

Hi, I'm trying to exploit the autoencoder model for dimensionality reduction, but for some reason the encoded version of my data looks all the same, in the sense that all encoded versions of the training vectors have almost the same values and are very very small (sometimes E-200). This makes them useless for next computations. Can somebody help me to figure out why? The code is pretty simple

val stackedAutoencoder: StackedAutoencoder = {
      if (modelType==0)
        new StackedAutoencoder().setLayers(Array(numCols, 5))
          .setInputCol("_2")
          .setDataIn01Interval(true)
          .setBuildDecoder(false)
          .setMaxIter(1)

val saModel: StackedAutoencoderModel = stackedAutoencoder.fit(train)
    val encodedData = saModel.setInputCol("_2").setOutputCol("encoded").encode(train.toDF)
(encodedData, saModel)