AJResearchGroup / nsphs_ml_qt

R package for nsphs_ml_qt
GNU General Public License v3.0
0 stars 1 forks source link

Do autoencode-only with latent layer of 3 neurons #55

Open richelbilderbeek opened 2 years ago

richelbilderbeek commented 2 years ago
richel@N141CU:~$ python3 ~/.local/share/GenoCAE/run_gcae.py train --datadir /home/richel/GitHubs/gcaer/inst/extdata/ --data gcae_input_files_1 --model_id M1_3n --resume_from 0 --epochs 1 --save_interval 1 --train_opts_id ex3 --data_opts_id b_0_4 --trainedmodeldir /home/richel/.cache/gcaer/file1dc156c34037/
2022-05-20 13:34:45.815151: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-05-20 13:34:45.837919: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-05-20 13:34:45.837937: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (N141CU): /proc/driver/nvidia/version does not exist
2022-05-20 13:34:45.838170: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-05-20 13:34:45.843953: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2099940000 Hz
2022-05-20 13:34:45.844385: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc088000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-05-20 13:34:45.844425: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
tensorflow version 2.2.0

______________________________ arguments ______________________________
train : True
datadir : /home/richel/GitHubs/gcaer/inst/extdata/
data : gcae_input_files_1
model_id : M1_3n
train_opts_id : ex3
data_opts_id : b_0_4
save_interval : 1
epochs : 1
resume_from : 0
trainedmodeldir : /home/richel/.cache/gcaer/file1dc156c34037/
pheno_model_id : None
project : False
superpops : None
epoch : None
pdata : None
trainedmodelname : None
plot : False
animate : False
evaluate : False
metrics : None

______________________________ data opts ______________________________
sparsifies : [0.0, 0.1, 0.2, 0.3, 0.4]
norm_opts : {'flip': False, 'missing_val': -1.0}
norm_mode : genotypewise01
impute_missing : True
validation_split : 0.2

______________________________ train opts ______________________________
learning_rate : 0.00032
batch_size : 10
noise_std : 0.0032
n_samples : -1
loss : {'module': 'tf.keras.losses', 'class': 'CategoricalCrossentropy', 'args': {'from_logits': False}}
regularizer : {'reg_factor': 1e-07, 'module': 'tf.keras.regularizers', 'class': 'l2'}
lr_scheme : {'module': 'tf.keras.optimizers.schedules', 'class': 'ExponentialDecay', 'args': {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}}
______________________________
Imputing originally missing genotypes to most common value.
Reading ind pop list from /home/richel/GitHubs/gcaer/inst/extdata/gcae_input_files_1.fam
Reading ind pop list from /home/richel/GitHubs/gcaer/inst/extdata/gcae_input_files_1.fam
Mapping files: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 241.18it/s]
Using learning rate schedule tf.keras.optimizers.schedules.ExponentialDecay with {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}

______________________________ Data ______________________________
N unique train samples: 800
--- training on : 800
N valid samples: 200
N markers: 1

______________________________ Building model ______________________________
Traceback (most recent call last):
  File "/home/richel/.local/share/GenoCAE/run_gcae.py", line 1619, in <module>
    main()
  File "/home/richel/.local/share/GenoCAE/run_gcae.py", line 1004, in main
    autoencoder = Autoencoder(model_architecture, n_markers, noise_std, regularizer)
  File "/home/richel/.local/share/GenoCAE/run_gcae.py", line 87, in __init__
    layer_module = getattr(eval(first_layer_def["module"]), first_layer_def["class"])
TypeError: eval() arg 1 must be a string, bytes or code object
richelbilderbeek commented 2 years ago

The modified file has also a different layout:

Screenshot from 2022-05-20 13-38-22

richelbilderbeek commented 2 years ago

Use a same layout:

Screenshot from 2022-05-20 13-39-11

richelbilderbeek commented 2 years ago
richel@N141CU:~/.cache/gcaer/file1dc115a5b4c8/ae.M1_3n.ex3.b_0_4.gcae_input_files_1$ 'python3' ~/.local/share/GenoCAE/run_gcae.py project --datadir /home/richel/GitHubs/gcaer/inst/extdata/ --data gcae_input_files_1 --model_id M1_3n --train_opts_id ex3 --data_opts_id b_0_4 --trainedmodeldir /home/richel/.cache/gcaer/file1dc13a3ab996/
2022-05-20 14:27:50.120315: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-05-20 14:27:50.146672: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-05-20 14:27:50.146739: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (N141CU): /proc/driver/nvidia/version does not exist
2022-05-20 14:27:50.147310: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-05-20 14:27:50.173128: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2099940000 Hz
2022-05-20 14:27:50.174602: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa588000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-05-20 14:27:50.174665: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
tensorflow version 2.2.0

______________________________ arguments ______________________________
train : False
datadir : /home/richel/GitHubs/gcaer/inst/extdata/
data : gcae_input_files_1
model_id : M1_3n
train_opts_id : ex3
data_opts_id : b_0_4
save_interval : None
epochs : None
resume_from : None
trainedmodeldir : /home/richel/.cache/gcaer/file1dc13a3ab996/
pheno_model_id : None
project : True
superpops : None
epoch : None
pdata : None
trainedmodelname : None
plot : False
animate : False
evaluate : False
metrics : None

______________________________ data opts ______________________________
sparsifies : [0.0, 0.1, 0.2, 0.3, 0.4]
norm_opts : {'flip': False, 'missing_val': -1.0}
norm_mode : genotypewise01
impute_missing : True
validation_split : 0.2

______________________________ train opts ______________________________
learning_rate : 0.00032
batch_size : 10
noise_std : 0.0032
n_samples : -1
loss : {'module': 'tf.keras.losses', 'class': 'CategoricalCrossentropy', 'args': {'from_logits': False}}
regularizer : {'reg_factor': 1e-07, 'module': 'tf.keras.regularizers', 'class': 'l2'}
lr_scheme : {'module': 'tf.keras.optimizers.schedules', 'class': 'ExponentialDecay', 'args': {'decay_rate': 0.96, 'decay_steps': 100, 'staircase': False}}
______________________________
Imputing originally missing genotypes to most common value.
Reading ind pop list from /home/richel/GitHubs/gcaer/inst/extdata/gcae_input_files_1.fam
Reading ind pop list from /home/richel/GitHubs/gcaer/inst/extdata/gcae_input_files_1.fam
Mapping files: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 219.27it/s]
Projecting epochs: [1, 2]
Already projected: []
In DG.get_train_set: number of -1.0 genotypes in train: 0
In DG.get_train_set: number of -9 genotypes in train: 0
In DG.get_train_set: number of 0 values in train mask: 0
Replacing dataset ind_pop_list_train in /home/richel/.cache/gcaer/file1dc13a3ab996/ae.M1_3n.ex3.b_0_4.gcae_input_files_1/gcae_input_files_1/encoded_data.h5

______________________________ Building model ______________________________
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'strides': 1}
Adding layer: BatchNormalization: {}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: MaxPooling1D: {'pool_size': 5, 'strides': 2, 'padding': 'same'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Flatten: {}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dense: {'units': 3, 'name': 'encoded'}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 75, 'activation': 'elu'}
Adding layer: Dropout: {'rate': 0.01}
Adding layer: Dense: {'units': 8}
Adding layer: Reshape: {'target_shape': (1, 8), 'name': 'i_msvar'}
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu'}
Adding layer: BatchNormalization: {}
Adding layer: Reshape: {'target_shape': (1, 1, 8)}
Adding layer: UpSampling2D: {'size': (2, 1)}
Adding layer: Reshape: {'target_shape': (2, 8)}
Adding layer: ResidualBlock2: {'filters': 8, 'kernel_size': 5}
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
--- conv1d  filters: 8 kernel_size: 5
--- batch normalization
Adding layer: Conv1D: {'filters': 8, 'kernel_size': 5, 'padding': 'same', 'activation': 'elu', 'name': 'nms'}
Adding layer: BatchNormalization: {}
Adding layer: Conv1D: {'filters': 1, 'kernel_size': 1, 'padding': 'same'}
Adding layer: Flatten: {'name': 'logits'}
########################### epoch 1 ###########################
Reading weights from /home/richel/.cache/gcaer/file1dc13a3ab996/ae.M1_3n.ex3.b_0_4.gcae_input_files_1/weights/1
Traceback (most recent call last):
  File "/home/richel/.local/share/GenoCAE/run_gcae.py", line 1619, in <module>
    main()
  File "/home/richel/.local/share/GenoCAE/run_gcae.py", line 1287, in main
    encoded_train = np.concatenate((encoded_train, encoded_train_batch), axis=0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2
richelbilderbeek commented 2 years ago

This is the warning:

Warning messages:                                                                                                 
1: In system2(command = run_args[1], args = run_args[-1], stdout = TRUE,  :
  running command ''python3' ~/.local/share/GenoCAE/run_gcae.py project --datadir /home/richel/GitHubs/gcaer/inst/extdata/ --data gcae_input_files_1 --model_id M1_3n --train_opts_id ex3 --data_opts_id b_0_4 --trainedmodeldir ~/.cache/gcaer/ae_out315275230761/ 2>&1' had status 1
2: In system2(command = run_args[1], args = run_args[-1], stdout = TRUE,  :
  running command ''python3' ~/.local/share/GenoCAE/run_gcae.py project --datadir /home/richel/GitHubs/gcaer/inst/extdata/ --data gcae_input_files_1 --model_id M1_3n --train_opts_id ex3 --data_opts_id b_0_4 --trainedmodeldir ~/.cache/gcaer/ae_out315275230761/ 2>&1' had status 1

Note that the problem is in project only.

richelbilderbeek commented 2 years ago

Fixed the warning by only doing project when there are 2 neurons in the latent layer.