ayushkarnawat / profit

Exploring evolutionary protein fitness landscapes
MIT License
1 stars 0 forks source link

Unable to train using keras/tf models #39

Open ayushkarnawat opened 4 years ago

ayushkarnawat commented 4 years ago

Running the following script:

import numpy as np
import tensorflow as tf

from data import load_dataset
from profit.models.gcn import GCN
from profit.dataset.splitters import split_method_dict

# Preprocess + load the dataset
data = load_dataset('gcn', 'tertiary', labels='Fitness', num_data=10, \
    filetype='tfrecords', as_numpy=True)

# Shuffle, split and batch
train_idx, val_idx = split_method_dict['stratified']().train_valid_split(data[0], \
    labels=data[-1].flatten(), return_idxs=True)
train_data = []
val_data = []
for arr in data:
    train_data.append(arr[train_idx])
    val_data.append(arr[val_idx])

train_X = train_data[:-1]
train_y = train_data[-1]
val_X = val_data[:-1]
val_y = val_data[-1]

# Initialize GCN model (really hacky), it also assumes we have the data loaded 
# in memory, which is the wrong approach. Instead, we should peek into the 
# shape defined in the TF tensors.
num_atoms, num_feats = train_data[0].shape[1], train_data[0].shape[2]
labels = train_data[-1]
num_outputs = labels.shape[1]
labels_std = np.std(labels, axis=0)
model = GCN(num_atoms, num_feats, num_outputs=num_outputs, std=labels_std).get_model()

# Fit model and report metrics
model.fit(train_X, train_y, batch_size=5, epochs=3, shuffle=True, 
          validation_data=(val_X, val_y), verbose=1)

gives the following error:

Train on 8 samples, validate on 2 samples
Epoch 1/3
2019-12-19 15:42:17.230535: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-12-19 15:42:17.230868: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Abort trap: 6

I tried to fix this by adding both: os.environ['KMP_DUPLICATE_LIB_OK']='True' and by installing conda install -c anaconda nomkl, but none of them fixed the issue.

ayushkarnawat commented 4 years ago

Could this be a result of not updating to the latest version of TF (aka tf2.0)?