Zahlii / colab-tf-utils

Automatically backup keras/tensorflow models from Google's Colab service to your GoogleDrive based on a keras callback!
GNU General Public License v3.0
82 stars 23 forks source link

NotFoundError: Container localhost does not exist. #6

Open DarkknightRHZ opened 6 years ago

DarkknightRHZ commented 6 years ago

Hi, I am trying to use the code snippet for creating the callback but getting this error -> NotFoundError: Container localhost does not exist. (Could not find resource: localhost/RMSprop_2/iterations) Please can you help me?

Zahlii commented 6 years ago

Can you provide any code or whatsoever that can reproduce the problem? From what I guess this sounds more like a problem related to a tensorflow model/library you are using.

DarkknightRHZ commented 6 years ago

Thank you for the reply. I have added !rm utils.* and now the warning have changed.

Below are the new logs -

START

rm: cannot remove 'utils*': No such file or directory --2018-07-12 01:15:28-- https://raw.githubusercontent.com/Zahlii/colab-tf-utils/master/utils.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 6935 (6.8K) [text/plain] Saving to: ‘utils.py’

utils.py 100%[===================>] 6.77K --.-KB/s in 0s

2018-07-12 01:15:28 (28.2 MB/s) - ‘utils.py’ saved [6935/6935]

Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (4.23.4) Requirement already satisfied: keras in /usr/local/lib/python3.6/dist-packages (2.1.6) Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/dist-packages (from keras) (3.13) Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/dist-packages (from keras) (1.11.0) Requirement already satisfied: numpy>=1.9.1 in /usr/local/lib/python3.6/dist-packages (from keras) (1.14.5) Requirement already satisfied: scipy>=0.14 in /usr/local/lib/python3.6/dist-packages (from keras) (0.19.1) Requirement already satisfied: h5py in /usr/local/lib/python3.6/dist-packages (from keras) (2.8.0) rm: cannot remove 'tboard.py': No such file or directory --2018-07-12 01:15:35-- https://raw.githubusercontent.com/mixuala/colab_utils/master/tboard.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 5214 (5.1K) [text/plain] Saving to: ‘tboard.py’

tboard.py 100%[===================>] 5.09K --.-KB/s in 0s

2018-07-12 01:15:35 (52.8 MB/s) - ‘tboard.py’ saved [5214/5214]

ngrok installed status: tensorboard=True, ngrok=True WARNING:google.auth._default:No project ID could be determined. Consider running gcloud config set project or setting the GOOGLE_CLOUD_PROJECT environment variable WARNING:googleapiclient.discovery_cache:file_cache is unavailable when using oauth2client >= 4.0.0 Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/googleapiclient/discovery_cache/init.py", line 36, in autodetect from google.appengine.api import memcache ModuleNotFoundError: No module named 'google.appengine'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 33, in from oauth2client.contrib.locked_file import LockedFile ModuleNotFoundError: No module named 'oauth2client.contrib.locked_file'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 37, in from oauth2client.locked_file import LockedFile ModuleNotFoundError: No module named 'oauth2client.locked_file'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/googleapiclient/discovery_cache/init.py", line 41, in autodetect from . import file_cache File "/usr/local/lib/python3.6/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in 'file_cache is unavailable when using oauth2client >= 4.0.0') ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 WARNING:google.auth._default:No project ID could be determined. Consider running gcloud config set project or setting the GOOGLE_CLOUD_PROJECT environment variable tensorboard url= http://6f8fcc3c.ngrok.io

END

I am adding the codes that I have used in the next comment

DarkknightRHZ commented 6 years ago

These codes are from Hvass-Labs rep, I am using them to get a grasp on machine translation. Thanks to Hvass-Labs.

######################################################################## #

Functions for downloading and extracting data-files from the internet.

#

Implemented in Python 3.5

# ######################################################################## #

This file is part of the TensorFlow Tutorials available at:

#

https://github.com/Hvass-Labs/TensorFlow-Tutorials

#

Published under the MIT License. See the file LICENSE for details.

#

Copyright 2016 by Magnus Erik Hvass Pedersen

# ########################################################################

import sys import urllib.request import tarfile import zipfile

########################################################################

def _print_download_progress(count, block_size, total_size): """ Function used for printing the download progress. Used as a call-back function in maybe_download_and_extract(). """

# Percentage completion.
pct_complete = float(count * block_size) / total_size

# Status-message. Note the \r which means the line should overwrite itself.
msg = "\r- Download progress: {0:.1%}".format(pct_complete)

# Print it.
sys.stdout.write(msg)
sys.stdout.flush()

########################################################################

def maybe_download_and_extract_1(url, download_dir): """ Download and extract the data if it doesn't already exist. Assumes the url is a tar-ball file.

:param url:
    Internet URL for the tar-file to download.
    Example: "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"

:param download_dir:
    Directory where the downloaded file is saved.
    Example: "data/CIFAR-10/"

:return:
    Nothing.
"""

# Filename for saving the file downloaded from the internet.
# Use the filename from the URL and add it to the download_dir.
filename = url.split('/')[-1]
file_path = os.path.join(download_dir, filename)

# Check if the file already exists.
# If it exists then we assume it has also been extracted,
# otherwise we need to download and extract it now.
if not os.path.exists(file_path):
    # Check if the download directory exists, otherwise create it.
    if not os.path.exists(download_dir):
        os.makedirs(download_dir)

    # Download the file from the internet.
    file_path, _ = urllib.request.urlretrieve(url=url,
                                              filename=file_path,
                                              reporthook=_print_download_progress)

    print()
    print("Download finished. Extracting files.")

    if file_path.endswith(".zip"):
        # Unpack the zip-file.
        zipfile.ZipFile(file=file_path, mode="r").extractall(download_dir)
    elif file_path.endswith((".tar.gz", ".tgz")):
        # Unpack the tar-ball.
        tarfile.open(name=file_path, mode="r:gz").extractall(download_dir)

    print("Done.")
else:
    print("Data has apparently already been downloaded and unpacked.")

######################################################################## import os ########################################################################

Directory where you want to download and save the data-set.

Set this before you start calling any of the functions below.

data_dir = "/content/drive/ColabNotebooks/nmt2"

Base-URL for the data-sets on the internet.

data_url = "http://www.statmt.org/europarl/v7/"

########################################################################

Public functions that you may call to download the data-set from

the internet and load the data into memory.

def maybe_download_and_extract(language_code="da"): """ Download and extract the Europarl data-set if the data-file doesn't already exist in data_dir. The data-set is for translating between English and the given language-code (e.g. 'da' for Danish, see the list of available language-codes above). """

# Create the full URL for the file with this data-set.
url = data_url + language_code + "-en.tgz"

maybe_download_and_extract_1(url=url, download_dir=data_dir)

def load_data(english=True, language_code="da", start="", end=""): """ Load the data-file for either the English-language texts or for the other language (e.g. "da" for Danish).

All lines of the data-file are returned as a list of strings.

:param english:
  Boolean whether to load the data-file for
  English (True) or the other language (False).

:param language_code:
  Two-char code for the other language e.g. "da" for Danish.
  See list of available codes above.

:param start:
  Prepend each line with this text e.g. "ssss " to indicate start of line.

:param end:
  Append each line with this text e.g. " eeee" to indicate end of line.

:return:
  List of strings with all the lines of the data-file.
"""

if english:
    # Load the English data.
    filename = "europarl-v7.{0}-en.en".format(language_code)
else:
    # Load the other language.
    filename = "europarl-v7.{0}-en.{0}".format(language_code)

# Full path for the data-file.
path = os.path.join(data_dir, filename)

# Open and read all the contents of the data-file.
with open(path, encoding="utf-8") as file:
    # Read the line from file, strip leading and trailing whitespace,
    # prepend the start-text and append the end-text.
    texts = [start + line.strip() + end for line in file]

return texts

import matplotlib.pyplot as plt import tensorflow as tf import numpy as np import keras import math import os from tensorflow.python.keras.models import Model from tensorflow.python.keras.layers import Input, Dense, GRU, Embedding from tensorflow.python.keras.optimizers import RMSprop from tensorflow.python.keras.callbacks import EarlyStopping, ModelCheckpoint, TensorBoard from tensorflow.python.keras.preprocessing.text import Tokenizer from tensorflow.python.keras.preprocessing.sequence import pad_sequences language_code='bg' mark_start = 'ssss ' mark_end = ' eeee' maybe_download_and_extract(language_code=language_code) data_src = load_data(english=False, language_code=language_code) data_dest = load_data(english=True, language_code=language_code, start=mark_start, end=mark_end) data_src = data_src[0:5000] data_dest = data_dest[0:5000]

num_words = 10000 class TokenizerWrap(Tokenizer): """Wrap the Tokenizer-class from Keras with more functionality."""

def __init__(self, texts, padding,
             reverse=False, num_words=None):
    """
    :param texts: List of strings. This is the data-set.
    :param padding: Either 'post' or 'pre' padding.
    :param reverse: Boolean whether to reverse token-lists.
    :param num_words: Max number of words to use.
    """

    Tokenizer.__init__(self, num_words=num_words)

    # Create the vocabulary from the texts.
    self.fit_on_texts(texts)

    # Create inverse lookup from integer-tokens to words.
    self.index_to_word = dict(zip(self.word_index.values(),
                                  self.word_index.keys()))

    # Convert all texts to lists of integer-tokens.
    # Note that the sequences may have different lengths.
    self.tokens = self.texts_to_sequences(texts)

    if reverse:
        # Reverse the token-sequences.
        self.tokens = [list(reversed(x)) for x in self.tokens]

        # Sequences that are too long should now be truncated
        # at the beginning, which corresponds to the end of
        # the original sequences.
        truncating = 'pre'
    else:
        # Sequences that are too long should be truncated
        # at the end.
        truncating = 'post'

    # The number of integer-tokens in each sequence.
    self.num_tokens = [len(x) for x in self.tokens]

    # Max number of tokens to use in all sequences.
    # We will pad / truncate all sequences to this length.
    # This is a compromise so we save a lot of memory and
    # only have to truncate maybe 5% of all the sequences.
    self.max_tokens = np.mean(self.num_tokens) \
                      + 2 * np.std(self.num_tokens)
    self.max_tokens = int(self.max_tokens)

    # Pad / truncate all token-sequences to the given length.
    # This creates a 2-dim numpy matrix that is easier to use.
    self.tokens_padded = pad_sequences(self.tokens,
                                       maxlen=self.max_tokens,
                                       padding=padding,
                                       truncating=truncating)

def token_to_word(self, token):
    """Lookup a single word from an integer-token."""

    word = " " if token == 0 else self.index_to_word[token]
    return word 

def tokens_to_string(self, tokens):
    """Convert a list of integer-tokens to a string."""

    # Create a list of the individual words.
    words = [self.index_to_word[token]
             for token in tokens
             if token != 0]

    # Concatenate the words to a single string
    # with space between all the words.
    text = " ".join(words)

    return text

def text_to_tokens(self, text, reverse=False, padding=False):
    """
    Convert a single text-string to tokens with optional
    reversal and padding.
    """

    # Convert to tokens. Note that we assume there is only
    # a single text-string so we wrap it in a list.
    tokens = self.texts_to_sequences([text])
    tokens = np.array(tokens)

    if reverse:
        # Reverse the tokens.
        tokens = np.flip(tokens, axis=1)

        # Sequences that are too long should now be truncated
        # at the beginning, which corresponds to the end of
        # the original sequences.
        truncating = 'pre'
    else:
        # Sequences that are too long should be truncated
        # at the end.
        truncating = 'post'

    if padding:
        # Pad and truncate sequences to the given length.
        tokens = pad_sequences(tokens,
                               maxlen=self.max_tokens,
                               padding='pre',
                               truncating=truncating)

    return tokens

tokenizer_src = TokenizerWrap(texts=data_src, padding='pre', reverse=True, num_words=num_words)

tokenizer_dest = TokenizerWrap(texts=data_dest, padding='post', reverse=False, num_words=num_words)

tokens_src = tokenizer_src.tokens_padded tokens_dest = tokenizer_dest.tokens_padded

token_start = tokenizer_dest.word_index[mark_start.strip()] token_end = tokenizer_dest.word_index[mark_end.strip()]

encoder_input_data = tokens_src decoder_input_data = tokens_dest[:, :-1] decoder_output_data = tokens_dest[:, 1:]

encoder_input = Input(shape=(None, ), name='encoder_input') embedding_size = 32 encoder_embedding = Embedding(input_dim=num_words, output_dim=embedding_size, name='encoder_embedding')

state_size = 128 encoder_gru1 = GRU(state_size, name='encoder_gru1', return_sequences=True) encoder_gru2 = GRU(state_size, name='encoder_gru2', return_sequences=True) encoder_gru3 = GRU(state_size, name='encoder_gru3', return_sequences=False)

def connect_encoder():

Start the neural network with its input-layer.

net = encoder_input

# Connect the embedding-layer.
net = encoder_embedding(net)

# Connect all the GRU-layers.
net = encoder_gru1(net)
net = encoder_gru2(net)
net = encoder_gru3(net)

# This is the output of the encoder.
encoder_output = net

return encoder_output

encoder_output = connect_encoder()

decoder_initial_state = Input(shape=(state_size,), name='decoder_initial_state') decoder_input = Input(shape=(None, ), name='decoder_input') decoder_embedding = Embedding(input_dim=num_words, output_dim=embedding_size, name='decoder_embedding')

decoder_gru1 = GRU(state_size, name='decoder_gru1', return_sequences=True) decoder_gru2 = GRU(state_size, name='decoder_gru2', return_sequences=True) decoder_gru3 = GRU(state_size, name='decoder_gru3', return_sequences=True)

decoder_dense = Dense(num_words, activation='linear', name='decoder_output')

def connect_decoder(initial_state):

Start the decoder-network with its input-layer.

net = decoder_input

# Connect the embedding-layer.
net = decoder_embedding(net)

# Connect all the GRU-layers.
net = decoder_gru1(net, initial_state=initial_state)
net = decoder_gru2(net, initial_state=initial_state)
net = decoder_gru3(net, initial_state=initial_state)

# Connect the final dense layer that converts to
# one-hot encoded arrays.
decoder_output = decoder_dense(net)

return decoder_output

decoder_output = connect_decoder(initial_state=encoder_output)

model_train = Model(inputs=[encoder_input, decoder_input], outputs=[decoder_output])

model_encoder = Model(inputs=[encoder_input], outputs=[encoder_output])

decoder_output = connect_decoder(initial_state=decoder_initial_state)

model_decoder = Model(inputs=[decoder_input, decoder_initial_state], outputs=[decoder_output])

def sparse_cross_entropy(y_true, y_pred): """ Calculate the cross-entropy loss between y_true and y_pred.

y_true is a 2-rank tensor with the desired output.
The shape is [batch_size, sequence_length] and it
contains sequences of integer-tokens.

y_pred is the decoder's output which is a 3-rank tensor
with shape [batch_size, sequence_length, num_words]
so that for each sequence in the batch there is a one-hot
encoded array of length num_words.
"""

# Calculate the loss. This outputs a
# 2-rank tensor of shape [batch_size, sequence_length]
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_true,
                                                      logits=y_pred)

# Keras may reduce this across the first axis (the batch)
# but the semantics are unclear, so to be sure we use
# the loss across the entire 2-rank tensor, we reduce it
# to a single scalar with the mean function.
loss_mean = tf.reduce_mean(loss)

return loss_mean

optimizer = RMSprop(lr=1e-3) decoder_target = tf.placeholder(dtype='int32', shape=(None, None))

model_train.compile(optimizer=optimizer, loss=sparse_cross_entropy, target_tensors=[decoder_target])

HERE GOES YOUR CODES FOR THE CALLBACKS

!rm utils* !wget https://raw.githubusercontent.com/Zahlii/colab-tf-utils/master/utils.py import utils import os import keras

def compare(best, new): return best.losses['val_acc'] < new.losses['val_acc']

def path(new): if new.losses['valacc'] > 0.8: return 'VGG16%s.h5' % new.losses['val_acc']

callbacks = cb = [ utils.GDriveCheckpointer(compare,path), keras.callbacks.TensorBoard(log_dir=os.path.join(utils.LOG_DIR,'VGG16')) ]

THE END

Zahlii commented 6 years ago

Where are you running the code on? I don't think it is directly related to my code, but rather due to conflicting requirements.

DarkknightRHZ commented 6 years ago

I am the running on the google colab(gpu). Can you please explain what might be the requirement problem? I am a newbie. :)