Error when run train_classifier.py

zhao-haha commented 6 years ago

I Run train_classifier.py with following command on my PC successfully：

PS F:\PycharmProjects\cae> python .\train_classifier.py -b 128 -o "./out/" -l "(64)5c-2p-(64)3c-2p-(64)3c-2p" -fc 10 -ds "mnist"

However it crashed on the first training step：

Could you have a look at it? Thanks in advance!

ceteke commented 6 years ago

I'm taking a look at it it's probably due to some updates.

ceteke commented 6 years ago

Hi, I think I've messed somethings up in the master branch. I believe I've fixed them. If the problem continues please checkout to the old branch. In addition, I see that the README is not clear enough. I have added some more content please read them. Since I've used this code for my own research It's not exactly the same with the paper.

From README: train_classifier.py trains a Linear SVM and saves the embeddings of the previously trained Autoencoder. To train the Autoencoder you should use train_autoencoder.py. Output directory in train_classifier is the output directory where your Autoencoder is saved (Naming could have been better). Therefore, this model do not include the softmax layer in the network as discussed in the reference so the -fc parameter given while traning is not connected to a classifier but it is a bottleneck between encoder and decoder.

When I execute python train_autoencoder.py -e 100 -i 10 -b 64 -l "(64)5c-2p-(64)3c-2p-(64)3c-2p" -lr 0.001 -tb 0 -o out -fc 512 -s 1000 -ds mnist (the parameters here may not be correct :smile: ) I get:

Using TensorFlow backend.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
2018-02-27 18:00:08.661598: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-27 18:00:08.778751: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-27 18:00:08.779022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.46GiB
2018-02-27 18:00:08.779039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
Forming encoder
Forming decoder
Forming L2 optimizer with learning rate 0.001
Preprocessing
Started training.
Train steps: 3437

NVIDIA-SMI (so we see that it's working)

| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   47C    P2    83W / 230W |   7902MiB /  8112MiB |     56%      Default |
+-------------------------------+----------------------+----------------------+

I hope that I was helpful and this suits you well. Have a nice day!

zhao-haha commented 6 years ago

Thanks for your reply! I will check it！

ceteke / cae

Error when run train_classifier.py #1