flauted / tf-tut

A simple TensorFlow input pipeline tutorial. Uses TFRecords and the Dataset API.
0 stars 0 forks source link

Processing data after input pipeline #1

Open barbansonw opened 6 years ago

barbansonw commented 6 years ago

Hi,

We've talked about your input pipeline in StackOverflow and since you advised me to open an issue here, here I am. You've helped me a lot already but I would like to know more about the actual handling of images and labels. With the code you provided me we end the input pipeline with a dataset, just like your code here. Your code here is a bit too much for me to fully understand so I'm wondering what's the bets way to go about handling the data we get out of that input.

flauted commented 6 years ago

@barbansonw Thanks for moving over here. Just to make sure we're on the same page, let's go back to the beginning for a second...

In maketfr we read the images and labels off of the disk into numpy arrays in the Python workspace. I recommend doing resizing and cropping at this point. That way, the images we write into the TFR file are the same shape. Then we write each example into the TFR. We call the function and voila, we have a TFR file to use. I believe you're fine with this part.

In input_pipeline we set up an object - a dataset - that will read the TFR data straight into Tensors. That's what parse_protocol_buffer does. Then some technical code to convert the image from bytes into a uint8 Tensor and restore the original shape. Right after the reshape, we could use tf.image to do some simple preprocessing. For instance, convert RGB to HSV, adjust brightness, etc. I choose to normalize my image so that each element is in the interval [-1, 1]. Whatever you do, it's important to convert images to the float datatype so we can do real-valued math with them. In uint8 format, the image tensors can only do integer math. Not very useful! I recommend not changing shape of the input image as it flows from the TFR file into the Dataset. Again, I think you're fine with this.

An aside, why did I emphasize "will"? Remember maketfr was literally changing numpy arrays in the Python workspace. But, input_pipeline isn't really doing anything in Python other than creating that Dataset object. Tensors aren't being read off of the TFR until a later time. I like to think of defining the function as making the plumbing. We install the plumbing when we call the function. We turn on the water when we get to the Session. I'm going to keep with this metaphor because it makes a lot of sense to me.

I think it's best to explain now two steps I tuck away closer to the Session. When we call input_pipeline, the returned Dataset is still missing a little bit of plumbing. We actually need to turn it into an iterable. The Dataset instance has a method called make_initializable_iterator(). So we just call that. There's step 1. Next, we get the images and labels out. The instance now has a method called get_next(). So, we call that. That's step 2. I'm not sure if you noticed this in the example or not.

train_dataset = input_pipeline(
    batch_size, iterations, tfr_file=TFR_FILE)
with tf.name_scope("Input"):
    train_iterator = train_dataset.make_initializable_iterator()
    batch_image, batch_label = train_iterator.get_next()

At this point, the input pipeline is part of the graph. The Python batch_image, batch_label returned from get_next() still aren't full of data. They're our access points to hook up the inputs to the rest of the plumbing.

Let's pick back up at def model(image_tensor, ...), which is where the handling takes place. Sometimes this is referred to as the inference part of the graph. In the example, I defined a simple convolutional neural network. Nothing particularly fancy. Here's the key to understanding the handling of the data: We're going to pass batch_image straight into model when we call model to install the convolutional neural network. That is, we've finished writing the model function. When we call it, we simply say

logits = model(batch_image, ...)

Now model is part of our graph! We installed that plumbing. There's still no actual data in logits, but the graph can see how to turn batch_image into predictions. In other words, when we installed the CNN plumbing, we hooked it up to batch_image. What comes out of that chunk of pipe is a prediction.

The labels get into the cost/error/loss (all names for the same thing) function the exact same way. We write up a function - make some plumbing - called loss that compares the predictions - logits - with the labels. So,

loss_op = loss(logits=logits, labels=batch_label)

We installed a loss function into our graph! The plumbing here is hooked up to the input pipes, i.e. batch_label, and the end of the CNN pipes, i.e. logits.

Now on top of it all, we put the training. I chose an AdamOptimizer because it's common, but the API for all the training operations is more or less the same. We just install the training pipes right after the loss. That is,

training_op = train(loss_op, ...)

Easy enough. The training gets the error data and adjusts the variables in the model part of the graph to minimize error!

(You might be wondering how the AdamOptimizer accesses all the variables we defined in model to optimize them. After all, we didn't explicitly hook up that part of the plumbing. Don't worry too much about it - unless you're writing a GAN. If you're still curious, under the hood, when we define a variable with tf.Variable or tf.get_variable, TensorFlow automatically adds them to a list called tf.GraphKeys.TRAINABLE_VARIABLES (and tf.GraphKeys.GLOBAL_VARIABLES ... 99% of the time, sometimes a stray gets into tf.GraphKeys.LOCAL_VARIABLES, but not in my example). Objects in tf.train, including AdamOptimizer, access the tf.GraphKeys.TRAINABLE_VARIABLES list that has all the variables in it. When we call minimize, it creates the plumbing from the loss_op to each variable in that list. Specifically, for each variable, it calculates the error gradients w.r.t the variable and creates a tf.update op that reassigns the variable a new value that should minimize loss according to the *prop algorithm (backprop, RMSprop, etc). It puts all those update pipes into a list that it returns as training_op. But, that's more than we need to worry about.)

Now we have all the plumbing ready! We have something installed to transport water from the city's underground pipeline - Dataset.get_next() reads batch_image and batch_label from the TFR. We have plumbing installed to filter the water - logits = model(batch_image) makes predictions. And we have loss and training plumbing. Great! Our pipework is ready for us to turn on the water.

First thing's first, we have to put actual values into our variables. If you read the note about how training accesses variables, it should make sense why we call global_variables_initializer(). If you didn't, just know there's a list of all the variables you defined hiding under the hood. Then, we fill our input pipes with water by calling the initializer. And finally, we can run our training loop. Calling train_op evaluates the relevant parts of the graph to train (which is everything...). So, it'll call up some new data out of the iterator, run the CNN operations, calculate the loss, and update all the variables to minimize error. The trainop doesn't really return anything (actually it returns None) to Python, so we just let it unpack into ``. However, every time the model has trained on 100 examples, we want to get some insight into how it's doing. So, we tell sess we want it to run the graph up to loss_op and give us the value back in Python. We print it.

with tf.Session() as sess:
    # Initialize all the tf.get_variable()s
    sess.run(tf.global_variables_initializer())

sess.run(train_iterator.initializer)

for epoch in range(iterations + 1):
    check_in = epoch % 100 == 0
    feed_dict = {keep_prob: 0.5}  # In the example I'm using dropout.
    _ = sess.run(
        [train_op], feed_dict=feed_dict)
    train_writer.add_summary(summary, epoch)
    if check_in:
        curr_loss = sess.run([loss_op], feed_dict=feed_dict)
        print(epoch, curr_loss)
train_writer.close()

Summary/Tl;dr

To wrap it up in a sentence, the best way to handle the data that comes out of the Dataset is to construct the graph so that the data flows right into the neural network. No placeholder. No explicit calling in the session. The back-end magically handles calling up new data into batch_image and batch_label.

I hope this answers your question. But, I realize explanations are not one-size-fits-all. If this made no sense to you and didn't help you in the slightest, I'll try again. Just let me know and ask the most specific question you can.

DISCLAIMER: I'm not an expert. I haven't contributed to TensorFlow nor do I have any formal education in machine learning. My only qualifications for answering your questions are 1) I was a novice to TensorFlow 6 months ago, and 2) I've made working input pipelines and models - the old queue way, the Placeholder way, and the Dataset way.

barbansonw commented 6 years ago

This is so amazing! You explained your reasoning behind everything great!

So if i understand correctly the returned dataset gets to train_iterator and with the get_next() function we get image and label batches. I don't really understand how to design a convolutional network with those but this shouldn't be too hard. I just really hate that the main examples and tutorials don't work with regular data but the MNIST-stuff or similar datasets.

So I have my dataset, I have my plumbing (nice analogy btw) and my next step would be to use the image batches for def model(image_tensor):? I get (now) how you get image batches from all the work that has been done to get to train_iterator.get_next() but i can't really see how using a batch of images as an input would produce a prediction as an output.

I do see how a prediction and the labels could produce loss (well the way I see it you take your prediction, see how it matches with the corresponding labels and use that to calculate the loss.). Then you get to train(some_loss) and I lose track again. I don't exactly see what train_op does in this situation which makes it hard to understand how to use the input.

I do like how you divide this part (from model(image_tensor) to return train_op) into parts because seeing the input and output kinda makes mee see how the data flows trough the code.

I really want to thank you again for your help you've been a great source of information so far and I'm really happy you took time to explain things so clearly.

barbansonw commented 6 years ago

I've put together your code so far but I do still seem to get thehas type <class 'str'>, but expected one of: (<class 'int'>,) error. I've double checked if I implemented the fix you provided on stackoverflow but this does not fix the issue. In what way do you structure the path document if I may ask?

flauted commented 6 years ago

@barbansonw Yes, it is frustrating that the TensorFlow docs use MNIST examples... In some sense, it's nice because very basic examples don't need to talk about how to load MNIST data off of a local machine. On the other hand, I haven't seen a satisfactory tutorial on loading data in the official docs. There's this guide that I referred you to, but it's still relatively disjoint snippets. It's a big road-block to learning TensorFlow for any practical purpose.

My file structure for the .ipynbs here

/home/dylan/
  TF/
    misc/
      these notebooks.ipynb
      tb/
/media/dylan/DATA/ (my SSD)
  tfr/
    celeba_tutorial
  list_attr_celeba.txt
  img_align_celeba/
    000001.jpg
        ....

I'm on Linux.

Do you need machine learning/CNN help?

What's your experience with Deep Learning? CNNs specifically? Are you okay with what's going on TensorFlow MNIST CNN guide?

Addressing how we get predictions... Inside model we've defined a convolutional neural network. That's a type of neural network that uses kernel convolution to extract features from an image and stacking them into the channels. Usually it's simultaneously downsizing the image in height/width. After a few convolutions, we flatten our image - now some size like batch, width 4, height 4, channels 128 into batch_size, 4*4*128. Then the fully connected layer matrix right-multiplies a weight variable of shape 4*4*128 x num_classes. Remember matrix multiplication size rules...A x B * B x C -> A x C, so we get batch_size x num_classes. For a given image in the batch, you have the likelihood that it belongs to some class. (Technically not a likelihood... These are unscaled "logits" since we don't pass them into an activation function that squashes their values into a nice range yet. But the loss function does a softmax on them, which turns them into a probability distribution over the number of classes, which can then be compared with the labels to find error - cross-entropy error since we're doing discrete predictions). So anyways, we got a tensor of shape batch_size x num_classes out of model. That's the same shape as labels, so that's our prediction.

If you're not okay with this, or don't have any experience with CNNs... The guide up there is a good resource. This YouTube video by Computerphile is a wonderful explanation of kernel convolutions. Here is a video with the same guy talking about CNNs.

I'd love to try my hand at explaining CNNs. But your time is better spent watching that guy with a Ph.D explain it for your average internet user than reading what I can write about it.

Are you fine with CNNs, just need help understanding the TensorFlow?

So you watched those videos, or got frustrated that I was explaining stuff you already knew, and got to here. How are we going from image to prediction? When I say "prediction", I'm talking about something we can compare to the true values to return some error.

The function model defines a convolutional neural network. As you've noticed, I'm only calling this once. It's just in a function to separate that chunk of code from the rest of the script. I did NOT make it a function because we call it multiple times. Inside model, we define some weights and biases variables to use with the CNN. Trainable weights are a feature of all neural networks. CNNs use them in a clever way to extract information from the image and reshape it into something we can compare with labels. That something is the predictions. The def model defines the CNN chunk of pipe that we're putting into the graph. If you're lost on how to define a CNN in TensorFlow, visit the link at the very beginning of the last section, or check out my celeba_tutorial.ipynb! The convolutions and fully-connected matrix multiplication that we define inside def model(...) are added to the graph when we later call

logits = model(batch_images, ...)

On top of the input-pipeline plumbing (we access the end of that pipe with batch_images and batch_labels), we're screwing on some pipes that represent a CNN, and logits is our reference to the end of that pipework we just installed. The Python variable logits still isn't full of data, just like batch_images still isn't full of data. It's just a reference to the end of the CNN part of the graph! At tf.Session runtime, the graph gets filled with data.

Are we okay with everything up until this point? It's not so different from how loss works, as far as connecting it to the input pipeline. As far as how we get predictions, logits, out of the CNN, we're using trainable weights to extract information from the image and reshape into a Tensor of the same size as batch_labels. Based on how bad the prediction is, the training algorithm (we're about to discuss that) adjusts those weights to alter the predictions for the next batch of images!

TRAINING: What def train(loss) does is create the chunk of pipes that update all those weights/biases variables from the CNN based on how bad the predictions were. Let me copy the function definition:

def train(loss, learning_rate=1e-4):
    train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    return train_op

Don't worry about how AdamOptimizer gets those weights/biases right now. When/if you get curious, I explained that in my first reply. Just understand that AdamOptimizer is some pipe that's screwed onto the end of the loss pipework that updates all the weights/biases in the graph to minimize error - to make the predictions better. So when we call

loss_op = loss(logits=logits, labels=batch_label)
train_op = train(loss_op)

we're screwing on the training pipes to the end of the loss pipework. REMEMBER these aren't functions. They aren't values. Still just references to part of the graph!

SESS RUNTIME: Okay, we have already installed all the plumbing by saying

train_dataset = input_pipeline(
    batch_size, iterations, tfr_file=TFR_FILE)
with tf.name_scope("Input"):
    train_iterator = train_dataset.make_initializable_iterator()
    batch_image, batch_label = train_iterator.get_next()

logits = model(batch_image, keep_prob=keep_prob)

loss_op = loss(logits=logits, labels=batch_label)
train_op = train(loss_op)

We have five references to parts of the graph: batch_image where our images from TFR are set to flow, batch_label where our labels from TFR are set to flow, logits where the output of our CNN is set to flow, loss_op where the error is set to flow, and train_op... What's flowing to train_op? It's a bigger leap of abstraction. Let's say the new values for all the weights/biases variables that improve our predictions are flowing here. There's a lot happening under the hood in the train definition that we don't need to worry about!

So, our pipes are all made. Let's turn on the water. Read the comments! I'm trying to explain how sess.run works to you! I think it'll help you see what's going on,

with tf.Session() as sess:  # Turn on the water
    sess.run(tf.global_variables_initializer())  # Assign values to CNN weights/biases in the graph

    sess.run(train_iterator.initializer)  # Start reading stuff from TFR

    for epoch in range(iterations + 1):
        check_in = epoch % 100 == 0
        feed_dict = {keep_prob: 0.5}  # In the .ipynb I'm using dropout.

        # val = sess.run([reference_to_part_of_the_graph])
        #     val gets the actual data that's in reference_to_part_of_the_graph, which is
        #     found by running water through the pipes until it gets to reference_to_.
        #     All the running water happens under-the-hood in C++. You get values back
        #     into Python from C++ by sess.run-ing the reference to the value you want.

        # run everything up to training, **UPDATES THE VARIABLES**
        _ = sess.run(
            [train_op], feed_dict=feed_dict) 
        if check_in:
            # sometimes run the graph up to the error, return the error into python, and print it out.
            curr_loss = sess.run([loss_op], feed_dict=feed_dict)  
            print(epoch, curr_loss)
    train_writer.close()

Summary/Tl;dr/Key Takeaways

What I'm asking you:

barbansonw commented 6 years ago

Hello,

Again, great great great answer. I indeed already got how CNN's work and it's definitely the right way to approach my problem(but I watched the youtube clips anyway, they're pretty interesting nonetheless).

As for my experience; virtually none. I'm a 21 y/o student and I'm not even studying IT, I'm just really interested since I like to work with code on projects in my free time. I started on TensorFlow out of fascination without any prior knowledge and barely any experience in Python (I didn't really know what I was getting in to a couple of months ago, don't judge haha). I've been reading a ton of papers on machine learning and everything related, practised my Python skills with some help of a co-worker and now I'm here. This would probably explain to you why your explanations resonate with me way better than the TensorFlow tutorials which seem kinda chaotic and scattered around pages to me.

In the meanwhile I fixed the error(I did some preprocessing on the images that were read from disk but I made a mistake there which made making a tfrecord impossible).

And yes, did this DEFINITELY help. Even though I don't think we were 100% on the same page in this last post you explained things I didn't know I needed/wanted to know. I really can't thank you enough for your help so far this must've really cost some time for you.

My goal here isn't to make the best image recognition code there is, I'm just fascinated and your explanations really worked for me. I actually think I can get my own things working now so I'll probably try to make it work before bothering you again because you've helped more than enough. I'll definitely let you know how things work out from here(and maybe ask you some things I couldn't find online) and many many thanks again.

flauted commented 6 years ago

No problem, happy to help!

No judgement passed. I did the same thing in June. I knew Matlab and had taken a course in Java (I'm an EE undergrad). I'd made some simple neural networks in Matlab on my own time. I wanted to try deep learning, but didn't have a GPU then. I read that TensorFlow was a good option for CPU deep learning, so I ran through the Codecademy tutorial on Python... and quickly found out I was over my head with TensorFlow. It took a lot of time to get comfortable, but after a few projects things started clicking!

I built a few CNNs. Then a few GANs. Now I'm working on Differentiable Neural Computers. In the process, I got to know the docs really well, and I read a lot of other projects. Eventually I started reading some TF source code on Github.

With all the help I got from old SO questions and blog posts, I felt like it was time to return the favor so I've started answering some SO questions myself. After all, it was the community - not so much the docs - that helped me get my own inputs off disk, use TensorBoard, understand the RNN API, etc.

Point is, TensorFlow is difficult and intimidating at first. I commend you for choosing the hard way over some easier paths like Keras. In my opinion, it make deep/machine learning less of a black box when you actually write tf.matmul and tf.nn.conv2d. I'm glad that I've helped you and I'm happy to help you in the future. If you need anything, just leave a line on this Issue. I have email notifications turned on.

barbansonw commented 6 years ago

Hi,

after looking over the code I have some more questions to ask. At StackOverflow I showed you the way I read my image paths and corresponding labels like this:

`def read_labeled_image_list(image_list_file):

  f = open(image_list_file, 'r')
  filenames = []
  labels = []
  for line in f:
      filename, label = line[:-1].split(' ')
      filenames.append(filename)
      labels.append([int(label)])
  return filenames, labels`

I was looking over the code again and I'm guessing that part isn't necessary anymore because of the input pipeline you made? Should I remove that part? And in what way should I structure the file from where I read my image paths and labels?

EDIT1:

I've put together to things we've talked about at StackOverflow, here and some stuff I've got from the celeba tutorial. I just put it together, it probably won't work but I'd like to know if this combination seems right to you. We could also use this as an example to talk about other questions if you won't mind.

import tensorflow as tf
import numpy as np
from scipy.misc import imread
import cv2
from cv2 import resize

image_list_file = 'C:/dir/dir/paths.txt'
TFR_DIR = 'C:/dir/dir/TFRecords/tfrecord.tfrecord'
MODEL_DIR = 'C:/dir/dir/TensorFlow/Models'

def read_labeled_image_list(image_list_file):
    f = open(image_list_file, 'r')
    all_image_paths = []
    all_labels = []
    for line in f:
        filename, label = line[:-1].split(' ')
        all_image_paths.append(filename)
        all_labels.append(label)
    return all_image_paths, all_labels

def make_tfr(tfr_dir=TFR_DIR):
    def _int64_list_feature(a_list):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=a_list))

    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

    writer = tf.python_io.TFRecordWriter(tfr_dir)
    all_image_paths, all_labels = read_labeled_image_list(image_list_file)
    for path, label in zip(all_image_paths, all_labels):
        disk_im = imread(path)
        resized_im = cv2.resize(disk_im, (128, 128))
        raw_im = resized_im.tostring()
        print(all_labels)
        print(all_image_paths)
        example = tf.train.Example(
            features=tf.train.Features(
                feature={
                    'image_raw': _bytes_feature(raw_im),
                    'label': _int64_list_feature(label)}))
        serialized = example.SerializeToString()
        writer.write(serialized)

make_tfr()#even weg commenten als er al een record aan is gemaakt

def input_pipeline(batch_size, epochs, tfr_dir=TFR_DIR):
    with tf.name_scope("Input"):
        dataset = tf.data.TFRecordDataset(tfr_dir)

    def parse_protocol_buffer(example_proto):
        features = {
            'image_raw': tf.FixedLenFeature((), tf.string), 
            'label': tf.FixedLenFeature((), tf.int64)}
        parsed_features = tf.parse_single_example(example_proto, features)
        return parsed_features['image_raw'], parsed_features['label']

    dataset = dataset.map(parse_protocol_buffer)

    def convert_parsed_proto_to_input(image_string, label):
        image_decoded = tf.decode_raw(image_string, tf.uint8)
        image_resized = tf.reshape(image_decoded, (128, 128, 3))
        image = tf.cast(image_resized, tf.float32)
        return image * (2. /255) -1, label

    dataset = dataset.map(convert_parsed_proto_to_input)
    dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.repeat(batch_size * epochs)
    return dataset

DESIRED_OUTPUT_SIZE = 2

def conv(imgs, filters_out, stride_size, kernel_size):
    filters_in = imgs.get_shape().as_list()[3]
    Kernel = tf.get_variable(
        "kernel",
        [kernel_size[0], kernel_size[1], filters_in, filters_out],
        initializer=tf.truncated_normal_initializer(stddev=0.1))
    Bias = tf.get_variable(
        "bias",
        [filters_out],
        initializer=tf.zeros_initializer())
    evidence = tf.nn.conv2d(
        imgs,
        Kernel,
        strides=[1, stride_size[0], stride_size[1], 1],
        padding="SAME")
    return evidence + Bias

def model(image_tensor, keep_prob=0.5):
    #tf.summary.image(image_tensor)
    with tf.variable_scope("layer1"):
        z1 = conv(image_tensor, 16, (2, 2), (5, 5))
        a1 = tf.nn.relu(z1)
    with tf.variable_scope("layer2"):
        z2 = conv(a1, 32, (2, 2), (5, 5))
        a2 = tf.nn.relu(z2)
    m2 = tf.nn.max_pool(a2, [1, 2, 2, 1], [1, 2, 2, 1], "SAME")
    with tf.variable_scope("layer3"):
        z3 = conv(m2, 64, (2, 2), (5, 5))
        a3 = tf.nn.relu(z3)
    with tf.variable_scope("layer4"):
        z4 = conv(a3, 64, (2, 2), (5, 5))
        a4 = tf.nn.relu(z4)

    final_shape = a4.get_shape().as_list()
    n_elems = final_shape[1] * final_shape[2] * final_shape[3]
    flat_a4 = tf.reshape(a4, [-1, n_elems])
    d4 = tf.nn.dropout(flat_a4, keep_prob)
    with tf.variable_scope("weight_and_bias"):
        W = tf.get_variable(
            "weights",
            [n_elems, DESIRED_OUTPUT_SIZE],
            initializer=tf.truncated_normal_initializer(stddev=0.1))
        b = tf.get_variable(
            "bias",
            [DESIRED_OUTPUT_SIZE],
            initializer=tf.zeros_initializer())
        logits = tf.matmul(d4, W) + b
    return logits

def loss(Logits, labels):
    with tf.name_scope("Eval"):
        cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
            logits=logits, labels=labels, dim=-1)
        some_loss = tf.reduce_mean(cross_entropy, axis=-1)
    return some_loss

def train(some_loss):
    train_op = tf.train.AdamOptimizer(1e-4).minimize(some_loss)
    return train_op

batch_size = 50
iterations = 10000

train_dataset = input_pipeline(batch_size, iterations)

with tf.name_scope("Input"):
    train_iterator = train_dataset.make_initializable_iterator()
    batch_image, batch_label = train_iterator.get_next()

tf.summary.image("inputs", batch_image, 1)

keep_prob = tf.placeholder(tf.float32)
tf.summary.scalar("keep_prob", keep_prob)

logits = model(batch_image, keep_prob=keep_prob)

loss_op = loss(logits=logits, labels=batch_label)

tf.summary.scalar("loss_op", loss_op)

train_op = train(loss_op)

summary_op = tf.summary.merge_all()

predictions = model(image)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
sess.run(train_iterator.initializer)
for epoch in range(iterations + 1):
    feed_dict = {keep_prob: 0.5}
    _ = sess.run([train_op], feed_dict=feed_dict)
    train_writer.add_summary(summary, epoch)
    if epoch % 100 == 0:
        curr_loss = sess.run([loss_op], feed_dict=feed_dict)
        print(epoch, curr_loss)
saver = tf.train.Saver()
save_path = saver.save(sess, MODEL_DIR)
train_writer.close()
flauted commented 6 years ago

Thanks for the edit, a concrete code to work with helps me. (I edited only to put the whole thing in a code block - no changes to what you wrote.)

First question, is read_labeled_image_list necessary? Sort of. Notice that read_labeled_image_list is called during make_tfr. It is necessary when writing TFRecords. However, it's only necessary when writing TFRecords. That's sort of a one-time payment. After you convert your images and labels to TFR, you don't need to do that again. Exceptions: re-making a train/test split, changing the actual images on disk, changing cv2 pre-TFR-processing. I recommend keeping read_labeled_image_list and make_tfr in the code in case one of those exceptions arise, and just commenting out the call to make_tfr() if you don't need to make TFR.

Structuring the file... I assume you mean image_list_file = 'C:/Users/dir/dir/paths.txt'? I can better answer that with some more information. Did you make that file or did it come from a dataset you found?

If it came from a dataset, my strategy is usually to write read_labeled_image_list custom for my project rather than editing the dataset file. Meaning, when I'm working with the MNIST file saved on my disk, the way I read the data from disk and put it in TFR is different than when I'm working with the Celeba. The MNIST dataset doesn't actually come as images. It's a text file with the images. On the other hand, Celeba is a folder with .jpg images. Rather than make both datasets work with the same from-disk function, I just make a different from-disk function for the datasets.

If you made paths.txt, how should the file be structured? Trickier question... Actually, I'd say whatever you did, it's structured well. The function you wrote to read the data is very simple, which makes me think the file is well-organized.

Code looks right on the high-level at first glance. There's some details I'd like to check out - and ultimately use to update the SO answer for future readers. I do have a slight cause for concern:

Maybe you could post a few lines of C:/Users/dir/dir/paths.txt so I can see how it's organized myself? Without that info, I'm left guessing. But for the sake of quick reply I'll take a stab: Looks to me like it's something like

4 C:/absolute/path/to/image001.jpg
2 C:/absolute/path/to/image002.jpg
...

In which case, I would advise that you convert your class label to an integer. You've done this in the original post, but not in the full code. Then there's some options available to you.

I can provide some example code, or verify that what you're doing works, if you give me an example of that .txt file.

Summary: Is the read image paths and labels function unnecessary? More or less, but keep it in case you need it again. Same story with make_tfr. Structure the .txt file? Don't fix what's not broken. If it's from a public dataset, fit your code to the .txt rather than the other way around. If it's yours, it looks good to me. Is the code the right merge of SO, Github and your own corrections? Yup, but I want to check on a few details. I need a little more information about your input first.

Could you tell me if this is a publicly available dataset or your own? Could you post the first couple lines of C:/Users/dir/dir/paths.txt? Are we on the same page about "the file from where I read my image paths and labels" referring to C:/Users/dir/dir/paths.txt?

I'm more than happy to answer more questions and ultimately get this code running. I'm on winter break so I have plenty of time now. Sorry if this isn't up to the same quality as my previous answers, but I really need an idea of how your disk data looks to give less abstract advice.

barbansonw commented 6 years ago

Ah I can get why this is confusing. You were pretty close at your guess of my file structure, it's structured like:

training-images/1.jpg 1
training-images/2.jpg 2
training-images/3.jpg 3
training-images/4.jpg 4

I'll look at the integer part indeed because I've been getting the '1' has type <class 'str'>, but expected one of: (<class 'int'>,) error again. Probably some change on my end that messed things up.

I'm going to work on it today so I'm hoping so fix that error soon so we are able to look at the rest of the code. I'm not sure if this would work, I used some code from your example to see what would happen. Anyway, I was looking at your celeba tutorial and I came across this part: DESIRED_OUTPUT_SIZE = 2 # This is num labels per img. MALE and FEMALE I can see how it works in your code but how would this work with more classes i.e. more labels? Looking at your code I'm guessing just changing the number wont work.

For now I'll be taking your advice and make sure my label code actually works. I'm not sure how because if my suspicion is right, when i change label to an int it still won't work because you can't iterate over an integer, so ill need a list of a list of integers. I'll keep you updated and thanks for the help again! Even when I'm pretty vague you're still able to actually know what I'm talking about.

barbansonw commented 6 years ago

Oh well it didn't take me too long to figure it out. Kinda knew what I was supposed to do just hadn't connected the dots yet, I was working on it in the make_tfr part while the change should've been made in read_labeled_image_list. Allright I think we have a fully functional input pipeline! And I actually know how it works!(Didn't really expect that last one to happen anywhere soon).

Now I'm running into errors with the code I got from your celeba tutorial. This Isn't too much of a problem because it's structured a little bit differently for what I can see. If I run into any other problems I can't solve I'll give you an update!

EDIT-

I've made some pretty good progress I think. My current code consists of:

import tensorflow as tf
import numpy as np
from scipy.misc import imread
import cv2
from cv2 import resize

image_list_file = 'C:/dir/dir/TensorFlow/paths.txt'
TFR_DIR = 'C:/dir/dir/TensorFlow/Data/TFRecords/Loep_tfrecord.tfrecord'
MODEL_DIR = 'C:/dir/dir/TensorFlow/Models'

def read_labeled_image_list(image_list_file):
    f = open(image_list_file, 'r')
    all_image_paths = []
    all_labels = []
    for line in f:
        filename, label = line[:-1].split(' ')
        all_image_paths.append(filename)
        all_labels.append(int(label))
    return all_image_paths, all_labels

def make_tfr(tfr_dir=TFR_DIR):
    def _int64_list_feature(a_list):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=a_list))

    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

    writer = tf.python_io.TFRecordWriter(tfr_dir)
    all_image_paths, all_labels = read_labeled_image_list(image_list_file)
    for path, label in zip(all_image_paths, all_labels):
        disk_im = imread(path)
        resized_im = cv2.resize(disk_im, (128, 128))
        raw_im = resized_im.tostring()
        example = tf.train.Example(
            features=tf.train.Features(
                feature={
                    'image_raw': _bytes_feature(raw_im),
                    'label': _int64_list_feature(all_labels)}))
        serialized = example.SerializeToString()
        writer.write(serialized)

make_tfr()

def input_pipeline(batch_size, epochs, tfr_dir=TFR_DIR):
    with tf.name_scope("Input"):
        dataset = tf.data.TFRecordDataset(tfr_dir)

    def parse_protocol_buffer(example_proto):
        features = {
            'image_raw': tf.FixedLenFeature((), tf.string), 
            'label': tf.FixedLenFeature((), tf.int64)}
        parsed_features = tf.parse_single_example(example_proto, features)
        return parsed_features['image_raw'], parsed_features['label']

    dataset = dataset.map(parse_protocol_buffer)

    def convert_parsed_proto_to_input(image_string, label):
        image_decoded = tf.decode_raw(image_string, tf.uint8)
        image_resized = tf.reshape(image_decoded, (128, 128, 3, 512))
        image = tf.cast(image_resized, tf.float32)
        return image * (2. /255) -1, label

    dataset = dataset.map(convert_parsed_proto_to_input)
    dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.repeat(batch_size * epochs)
    return dataset

DESIRED_OUTPUT_SIZE = 2 #2?

def conv(imgs, filters_out, stride_size, kernel_size):
    print(imgs.shape)
    filters_in = imgs.get_shape().as_list()[3]
    Kernel = tf.get_variable(
        "kernel",
        [kernel_size[0], kernel_size[1], filters_in, filters_out],
        initializer=tf.truncated_normal_initializer(stddev=0.1))
    Bias = tf.get_variable(
        "bias",
        [filters_out],
        initializer=tf.zeros_initializer())
    evidence = tf.nn.conv2d(
        imgs,
        Kernel,
        strides=[1, stride_size[0], stride_size[1], 1],
        padding="SAME")
    return evidence + Bias

def model(image_tensor, keep_prob=0.5):
    #tf.summary.image(image_tensor)
    tf.summary.image("img_summary", batch_image, 1)
    with tf.variable_scope("layer1"):
        z1 = conv(image_tensor, 16, (2, 2), (5, 5))
        a1 = tf.nn.relu(z1)
    with tf.variable_scope("layer2"):
        z2 = conv(a1, 32, (2, 2), (5, 5))
        a2 = tf.nn.relu(z2)
    m2 = tf.nn.max_pool(a2, [1, 2, 2, 1], [1, 2, 2, 1], "SAME")
    with tf.variable_scope("layer3"):
        z3 = conv(m2, 64, (2, 2), (5, 5))
        a3 = tf.nn.relu(z3)
    with tf.variable_scope("layer4"):
        z4 = conv(a3, 64, (2, 2), (5, 5))
        a4 = tf.nn.relu(z4)

    final_shape = a4.get_shape().as_list()
    n_elems = final_shape[1] * final_shape[2] * final_shape[3]
    flat_a4 = tf.reshape(a4, [-1, n_elems])
    d4 = tf.nn.dropout(flat_a4, keep_prob)
    with tf.variable_scope("weight_and_bias"):
        W = tf.get_variable(
            "weights",
            [n_elems, DESIRED_OUTPUT_SIZE],
            initializer=tf.truncated_normal_initializer(stddev=0.1))
        b = tf.get_variable(
            "bias",
            [DESIRED_OUTPUT_SIZE],
            initializer=tf.zeros_initializer())
        logits = tf.matmul(d4, W) + b
    return logits

def loss(logits=None, labels=None):
    with tf.name_scope("Eval"):
        xent = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)
        avg_xent = tf.reduce_mean(xent, axis=-1)
    return avg_xent

def train(some_loss):
    train_op = tf.train.AdadeltaOptimizer(1e-4).minimize(some_loss)
    return train_op

batch_size = 50
iterations = 1000

train_dataset = input_pipeline(batch_size, iterations)

with tf.name_scope("Input"):
    train_iterator = train_dataset.make_initializable_iterator()
    batch_image, batch_label = train_iterator.get_next()

tf.summary.image("inputs", batch_image, 1)

keep_prob = tf.placeholder(tf.float32)
tf.summary.scalar("keep_prob", keep_prob)

logits = model(batch_image, keep_prob=keep_prob)

loss_op = loss(logits=logits, labels=batch_label)

tf.summary.scalar("loss_op", loss_op)

train_op = train(loss_op)

summary_op = tf.summary.merge_all()

predictions = model(image)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(train_iterator.initializer)
    for epoch in range(iterations + 1):
        feed_dict = {keep_prob: 0.5} 
        _ = sess.run([train_op], feed_dict=feed_dict)
        train_writer.add_summary(summary, epoch)
        if epoch % 100 == 0:
            curr_loss = sess.run([loss_op], feed_dict=feed_dict)
            print(epoch, curr_loss)
saver = tf.train.Saver()
save_path = saver.save(sess, MODEL_DIR)
train_writer.close()

Things are starting to come together, but it's getting stuck at xent = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels). here i get the error output_shape = [product, shape[-1]]. I'm guessing the -1 comes from the defaulted dim=-1 at tf.nn.softmax... but the [product part doesn't really make sense to me. I'm trying hard to make things work so I hope I'm not bothering you with questions that I don't have to ask you

flauted commented 6 years ago

That is indeed a strange error message... Let me try to reproduce it, then I'll answer your questions. Very briefly: I think I see the problem and I'll try to show you how to fix it, but I really can't imagine how it threw that message. You're on version >=1.2, right? I can't say for sure if changing NUM_CLASSES is all you'd have to do, but that was the intent.

(EDIT 2: Disregard the question, I managed to replicate that error.)

EDIT: Oh, and you're definitely getting the hang of things. You're not bothering me, you're doing your part and learning. Very happy to be a part of it!

EDIT 2:

I suspect I did not communicate the bullet-list part of my last response very well. Let me try again, because I think that's at the root of your questions and your error.

Emulating your structure (for your reference)

Here's my working directory (on Linux, /home/username/ is just like your Windows C:\):

/home/dylan/dir/dir/TensorFlow/
    project.py
    paths.txt
    training-images/
       1.jpg
       2.jpg
       3.jpg
       4.jpg
    Data/
       TFRecords/
            Loep_tfrecord.tfrecord

I just copied the first four Celeba images for i.jpg. They're 178 x 218 RGB JPEGs for what it's worth.

Here's my paths.txt:

training-images/1.jpg 1 
training-images/2.jpg 2
training-images/3.jpg 3
training-images/4.jpg 4

How to get past the error: Prelims

You're close to understanding what's going on, but we've slightly erred in communication. Let me clarify a few things and then I'll give you a sample. These are two points I've skimmed over and deserve better attention.

Note: Forget DESIRED_OUTPUT_SIZE as a variable name. It's not a bad name, but let's use an alternative until you see why. Let's try using NUM_CLASSES instead - that should be more clear! (one explanation does NOT fit all)

Prelim 0: A Closer Look at TFR

TFR is a binary format, so most of what's in an actual record is unprintable. But, I went ahead and opened a file anyways so we can look at a training example:

...
�               # Probably an "example starts here" delimiter
image_raw�     # So we can parse
    <raw, unprintable characters I've omitted because it's long>
                # Probably an "end of that list" delimiter
target      # Again for the parser
               # Probably the target integer and then the "end of that list" delimiter.
...

Why am I showing you this? Well, the point I'm trying to make is that each training example gets its own complete entry in the TFR! That's a useful concept to have in mind!

Prelim 1: _*_feature()?

Essentially, TFR are written in a data format that's easy to load into the C++ graph (the backend). It happens (for some reason) that arrays, not a single value, are the best way to do that. Probably cuts down on the number of delimiters they have to introduce? I'm speculating.

Let's look at three _*_feature() functions you may need and try to understand them:

def _bytes_feature(arg):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[arg]))

def _int64_list_feature(a_list):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=a_list))

def _int64_feature(arg):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[arg]))

I think you're fine with what the first one does. The argument is a single image in bytes form! Since the bytes type isn't a list, we let value=[arg]. That is, the value keyword is getting a list with one entry: arg!

What's the difference between the second function and the third function? Say we have a class label for one training example. We can represent it two different ways:

label=2, NUM_CLASSES=4  <=>  one_hot_label=[0 1 0 0]

So, if we want to write label to our TFRecord, we use _int64_feature(label) since label isn't a list yet! On the other hand, if we want to write one_hot_label to our TFRecord, we use _int64_list_feature(one_hot_label) since one_hot_label is already a list!

Why am I showing you this? In your code, you sidestepped some sort of "not an iterable/list" error by passing all_labels as the label for every training example! :D From Prelim 0 and this, Prelim 1, you should (1) see why that's wrong, and (2) see that you have two correct options to proceed: write the integer class label or the list one-hot label. If you're confused, don't worry! I'm going to give you an example of each.

A word of advice: if you use the integer class label, you'll end up converting it to one-hot at some point anyways.

Prelim 2: What's going on with the Dataset maps?

And finally, I think I skimped out on a pretty important point, which led to... a rather strange implementation in your code. I'll try to explain how dataset.map(...) works... or, at least a way to think about it and correct usage. My comments are trying to explain! Read them!:

dataset = tf.data.TFRecordDataset(tfr_dir)

def parse_protocol_buffer(example_proto):
    """Instructions for parsing a SINGLE!! training example out of TFR."""
    features = {
        'image_raw': tf.FixedLenFeature((), tf.string), 
        'label': tf.FixedLenFeature((), tf.int64)}  
        # (), if one element array; would be (NUM_CLASSES) if one-hot array
    parsed_features = tf.parse_single_example(example_proto, features)
    return parsed_features['image_raw'], parsed_features['label']

dataset = dataset.map(parse_protocol_buffer)

def convert_parsed_proto_to_input(image_string, label):
    """Instructions for making a SINGLE!! parsed training ex. into useful inputs!"""
    image_decoded = tf.decode_raw(image_string, tf.uint8)
    image_resized = tf.reshape(image_decoded, (128, 128, 3))  
    #                                              ^ Shape of SINGLE!! img
    label = tf.one_hot(label, NUM_CLASSES)
    # Let's convert that single int label in TFR into an array of size (NUM_CLASSES,)
    return image * (2. /255) -1, label

dataset = dataset.map(convert_parsed_proto_to_input)
dataset = dataset.shuffle(buffer_size=1000)
# parse all the examples (maybe repeating some) and make them useful,
# and continually shuffle all the parsed examples as they're read.
dataset = dataset.repeat(batch_size*epochs) 

dataset = dataset.batch(batch_size)
# after parsing all the examples, making them useful, and shuffling,
# put them into batches. Make enough batches for the entire training loop.
dataset = dataset.repeat(epochs)
# images in dataset are now (batch_size, 128, 128, 3), 
# labels are (batch_size, NUM_CLASSES)
return dataset

So, how do I think about it? "Instructions"! The map function(s) is (are) instructions for how to fill the dataset when we initialize it. Is this correct usage? Check out randomly shuffling data and compare with Consuming TFRecord Data on the same page... As far as I can tell, this is the correct way to merge these two seemingly at-odds code snippets. You're welcome to experiment and prove that I'm wrong by omitting one of the repeats and convincing me it's shuffled and it actually runs, but that's beside the point right now.

Why am I showing you this? You seemingly tried to parse ALL your images, in order, into one giant tensor! Yes, I've seen this. I believe it's called one-shot training or something like that. If you meant to do that, let me know. But I think that was a mistake caused by (1) not understanding Prelim 0 completely, (2) trying to compensate for writing the labels for all the images for each training example image, i.e. not understanding Prelim 1 entirely, and (3) confusion about how to think about the map function.

Enough with this! To the examples!

How to get past the error: Examples

import tensorflow as tf
import numpy as np
from scipy.misc import imread
from cv2 import resize

# "Macros"
IMAGE_LIST_FILE = '/home/dylan/dir/dir/TensorFlow/paths.txt'
TFR_DIR='/home/dylan/dir/dir/TensorFlow/Data/TFRecords/Loep_tfrecord.tfrecord'
MODEL_DIR = '/home/dylan/dir/dir/TensorFlow/Models'
NUM_CLASSES = 4  # let's say there's precisely four classes
MODEL_INPUT_ROWS = MODEL_INPUT_COLS = 128
MODEL_INPUT_CHANNELS = 3

Option 1: tf.one_hot

def read_labeled_image_list(image_list_file):
    f = open(image_list_file, 'r')
    all_image_paths = []
    all_labels = []
    for line in f:
        filename, label = line[:-1].split(' ')
        all_image_paths.append(filename)
        all_labels.append(int(label))
    return all_image_paths, all_labels

def make_tfr(tfr_dir=TFR_DIR):
    def _int64_feature(value):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

    writer = tf.python_io.TFRecordWriter(tfr_dir)
    all_image_paths, all_labels = read_labeled_image_list(IMAGE_LIST_FILE)
    # all_labels = [1, 2, 3, 4]
    for path, label in zip(all_image_paths, all_labels):
        disk_im = imread(path)
        # cv2.resize(image, (width, height)) == cv2.resize(image, (cols, rows))
        resized_im = resize(disk_im, (MODEL_INPUT_COLS, MODEL_INPUT_ROWS))
        raw_im = resized_im.tostring()
        example = tf.train.Example(
            features=tf.train.Features(
                feature={
                    'image_raw': _bytes_feature(raw_im),
                    'label': _int64_feature(label)
                    #  loop 1: write [1], loop 2: write [2], ...
        }))
        serialized = example.SerializeToString()
        writer.write(serialized)

make_tfr()

def input_pipeline(batch_size, epochs, tfr_dir=TFR_DIR):
    with tf.name_scope("Input"):
        dataset = tf.data.TFRecordDataset(tfr_dir)

    def parse_protocol_buffer(example_proto):
        """Instructions to read ONE image/label! Repeated as much as needed!"""
        features = {
            'image_raw': tf.FixedLenFeature((), tf.string),
            'label': tf.FixedLenFeature((), tf.int64)}
        parsed_features = tf.parse_single_example(example_proto, features)
        return parsed_features['image_raw'], parsed_features['label']

    dataset = dataset.map(parse_protocol_buffer)

    def convert_parsed_proto_to_input(image_string, label):
        """Instructions to process ONE image/label! Repeated as needed!"""
        image_decoded = tf.decode_raw(image_string, tf.uint8)
        image_resized = tf.reshape(
            image_decoded,
            (MODEL_INPUT_ROWS, MODEL_INPUT_COLS, MODEL_INPUT_CHANNELS))
        image = tf.cast(image_resized, tf.float32)

        # tf.one_hot: label=[i] -> label=[0, 0, 0, 1, 0, 0]
        #             (one-indexed)       1        i     ^NUM_CLASSES
        label = tf.one_hot(label, NUM_CLASSES)
        return image * (2. /255) -1, label

    dataset = dataset.map(convert_parsed_proto_to_input)
    dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.repeat(batch_size * epochs)
    # Repeat the TFR-> usable tensors instructions

    dataset = dataset.batch(batch_size)  # AUTOMATICALLY LOAD A SHUFFLED BATCH
    dataset = dataset.repeat(epochs)  # Repeat the batch instruction
    return dataset

Option 2: Write one-hot to TFR

def read_labeled_image_list(image_list_file):
    f = open(image_list_file, 'r')
    all_image_paths = []
    all_labels = []
    for line in f:
        filename, label = line[:-1].split(' ')
        all_image_paths.append(filename)
        one_hot = np.zeros((NUM_CLASSES), dtype=np.int64)
        # Your labels are 1-indexed I assume (i.e. no class 0),
        # Python is 0-indexed
        one_hot[int(label)-1] = 1
        all_labels.append(one_hot)
    return all_image_paths, all_labels

def make_tfr(tfr_dir=TFR_DIR):
    def _int64_list_feature(a_list):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=a_list))

    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

    writer = tf.python_io.TFRecordWriter(tfr_dir)
    all_image_paths, all_labels = read_labeled_image_list(IMAGE_LIST_FILE)
    for path, label in zip(all_image_paths, all_labels):
        disk_im = imread(path)
        resized_im = resize(disk_im, (MODEL_INPUT_COLS, MODEL_INPUT_ROWS))
        raw_im = resized_im.tostring()
        example = tf.train.Example(
            features=tf.train.Features(
                feature={
                    'image_raw': _bytes_feature(raw_im),
                    'label': _int64_list_feature(label)
        }))
        serialized = example.SerializeToString()
        writer.write(serialized)

make_tfr()

def input_pipeline(batch_size, epochs, tfr_dir=TFR_DIR):
    with tf.name_scope("Input"):
        dataset = tf.data.TFRecordDataset(tfr_dir)

    def parse_protocol_buffer(example_proto):
        """Instructions to read ONE image/label! Repeated as much as needed!"""
        features = {
            'image_raw': tf.FixedLenFeature((), tf.string),
            'label': tf.FixedLenFeature((NUM_CLASSES), tf.int64)}
        parsed_features = tf.parse_single_example(example_proto, features)
        return parsed_features['image_raw'], parsed_features['label']

    dataset = dataset.map(parse_protocol_buffer)

    def convert_parsed_proto_to_input(image_string, label):
        """Instructions to process ONE image/label! Repeated as needed!"""
        image_decoded = tf.decode_raw(image_string, tf.uint8)
        image_resized = tf.reshape(
            image_decoded,
            (MODEL_INPUT_ROWS, MODEL_INPUT_COLS, MODEL_INPUT_CHANNELS))
        image = tf.cast(image_resized, tf.float32)
        return image * (2. /255) -1, label

    dataset = dataset.map(convert_parsed_proto_to_input)
    dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.repeat(batch_size * epochs)
    dataset = dataset.batch(batch_size)
    dataset = dataset.repeat(epochs)
    return dataset

Great! You're past the shape error. These examples both ought to get you to the same error: Something about "layer1/kernel is already defined, this is not allowed! I'm tf.get_variable getting mad at you and making you rethink using me instead of tf.Variable. But, don't! My errors are actually really helpful because I'm telling you that, somehow, you defined the tf.variable_scope("layer1") twice!" Or at least, that's how the error message reads to me...

Indeed you did! Probably by mistake, you called model twice in the code above. The second one isn't necessary. That is,

tf.summary.image("inputs", batch_image, 1)
keep_prob = tf.placeholder(tf.float32)
tf.summary.scalar("keep_prob", keep_prob)
logits = model(batch_image, keep_prob=keep_prob)
loss_op = loss(logits=logits, labels=batch_label)
tf.summary.scalar("loss_op", loss_op)
train_op = train(loss_op)
summary_op = tf.summary.merge_all()
# predictions = model(batch_image)  # No! Just call model once!

Now you should be very close to getting the graph built error-free. Inevitably, even with the graph built there will be some bug when tf.Session opens and there's actual water flowing through the pipes. But, an elephant is best eaten one meal at a time.

Oh, and don't forget to replace DESIRED_OUTPUT_SIZE with NUM_CLASSES everywhere (I think the only other references were in model).

I actually went back-and-forth on whether or not to tell you that you called model twice, but an error message from tf.get_variable is not very easy to grasp until you've seen it a few times. The error you should get after you incorporate my code will be about train_writer is not defined. I'll leave that one to you since I don't think you'll need any help with it.

Answers to your questions/Responses to comments:

barbansonw commented 6 years ago

I got it working! This last post was really all I needed for everything to come together. I really missed calling model twice, that wasn't smart haha. I can visualize my graph in TensorBoard now as well(no image though, even though the tf.summary.image("img_summary", batch_image, 1) at the first line of model). Thanks so much for your help so far!

My next step will be to expand my training data. Right now I have 42 pictures which obviously isn't quite enough. I was hoping there would be a tool or code online which gives me edited pictures of my input or something like that, do you have any advice on how I should approach this?

I also made a small script that generates the paths.txt file for me but I need to know how many images I'm ending up with to make sure the classes are correct. I'll add it when I'm sure it works correctly.

Apart from that I'm going to try to get some scalars so I can visualise the learning curve with that and if I'm working on that I'll try to add images to TensorBoard as well. One of the last stept would be to save the model and then I'll be able to play with.. well mostly the model part I guess. I'm guessing but if I want to alter some things in the way it learns that would be the place to start.

EDIT-

I see that curr_loss gets printed in the session, how would I get these numbers so I could use them in tensorboard and is there a way to not just print loss but get accuracy as well? There is a pretty good chance I don't really understand loss but again the tensorflow documentation really falls short on some basic things.

flauted commented 6 years ago

Could you post the code? I'll look at why tf.summary.image isn't working if you'd like.

I don't know if 42 pictures will ever be enough - you might want to look at retraining/fine-tuning a CNN like InceptionV3. If you want to do that, I can reference you to some good blog posts and give you examples from my own code. 2,000 heavily pre-processed images wasn't enough for me. Melamona vs benign mole. If you're trying to learn cat vs carrot vs car vs can, 42 may be enough.

On expanding your data, use tf.image in input_pipeline! Let me pull an example from a project (no promises a copy-paste will work).

...
dataset = dataset.map(convert_parsed_proto_to_input)

def augment_map(image, label):
    image = tf.image.random_brightness(image, 0.4)
    image, steering = tf.image.random_flip_left_right(image)
    return image, label

dataset = dataset.map(augment_map)
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.repeat(batch_size * epochs)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(epochs)
return dataset

Options listed in the docs, ctrl-f for "random"! As I'm sure you know, only change what you know doesn't destroy useful info. I.e. when classifying a mole the color matters so changing hue would be silly.

By pre-processing the Tensors randomly, you can make an infinite number of inputs without writing anything extra to TFR. If you want to do something tf.image doesn't support (random shadow comes to mind), we can talk about that specifically.

On model saving, that lets you save the trained variables (kernels, weights, biases) into a special type of file. You can use it to checkpoint during long training, or you can convert the variables to constants to "deploy" the model. Also, you can save and convert to constants to run validation/testing datasets. However, that's not a common way to accomplish that goal.

By "alter ... the way it learns" I'm not sure what you mean. You could theoretically use it to insert an untrained layer into an otherwise trained model. Theoretically you could change how you're evaluating loss mid-training (i.e. switch from cross-entropy to mean-squared-error, but why?).

I have primarily used the saving-loading API to (1) keep trained models that I want to deploy. For instance, I was working on a TensorFlow implementation of a self-driving car in Udacity's Self Driving Car Simulation. I would save training data from the Sim, train the model, then save the model and convert to constant. I'd load the saved model in a different script, open the Sim in autonomous mode, feed the data stream into the model and return the steering/throttle command into the Sim. So my use-case was real-time testing. And (2) specifically the loading mechanism to retrain Inception V3 on the mole project. The most common use case AFAIK is saving checkpoints for long training times. If your computer turns off on day 3 of a week of training, you don't have to start over. This is useful for, e.g., training on ImageNet or the full bAbI set. Both of which may take several weeks.

On loss and TensorBoard, loss is cost is error. It's the differentiable mechanism you use to train the model. If you don't know how backprop works, don't sweat it. You already know loss/cost/error is NOT accuracy, so you're good. Mean-squared-error and cross-entropy are standard choices for regression and classification tasks, respectively. So, if you want to find the centroid (x,y) coordinates of a shape in an image, you may frame that as a regression problem and calculate loss = mean([mean_squared_error(x, x_true), mean_squared_error(y, y_true)). If you want to determine if the shape is a square, circle, or triangle, you would use cross-entropy.

A word of tangential advice: Regression problems often call for custom loss. In the centroid problem, you might want to use a custom loss function, radial distance of prediction to centroid, i.e. loss = (x - x_true)**2 + (y - y_true)**2 = mean_squared_error(x, x_true) + mean_squared_error(y, y_true). The mean is over the batch.

I'm rambling. Classification accuracy and graphing loss:

def accuracy(logits=None, labels=None):
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
    accuracy_op = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    return accuracy_op

...

# Sanity check on the pipeline
tf.summary.image("inputs", batch_image, 1)

# Sanity check on the keep prob
keep_prob = tf.placeholder(tf.float32)
tf.summary.scalar("keep_prob", keep_prob)

logits = model(batch_image, keep_prob=keep_prob)
loss_op = loss(logits=logits, labels=batch_label)

# TensorBoard loss
tf.summary.scalar("xent", loss_op)

# Call accuracy, add a summary.
accuracy_op = accuracy(logits=logits, labels=batch_label)
tf.summary.scalar("accuracy", accuracy_op)

train_op = train(loss_op)

# Magically merge all the TensorBoard summaries.
# We should have at least "inputs" img, "keep_prob", "xent" (loss), "accuracy" scalars.
summary_op = tf.summary.merge_all()

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # Create a TensorBoard writer (this line adds the graph dashboard)
    train_writer = tf.summary.FileWriter(TFR_DIR, sess.graph)

    sess.run([train_iterator.initializer])

    for epoch in range(iterations + 1):
        check_in = epoch % 100 == 0

        curr_loss, curr_acc, _, summary = sess.run(
            [loss_op, accuracy_op, train_op, summary_op], feed_dict=train_dict)
        # v  Should add the image dash & scalar dash, add the image & graph point.
        train_writer.add_summary(summary, epoch)
        if check_in:
            curr_loss, curr_acc = sess.run([loss_op, accuracy_op], feed_dict=feed_dict)
            # v  Should print out classification accuracy every 100 epochs.
            print(epoch, curr_loss, curr_acc)
    train_writer.close()

Not writing anything about saver since I can't feel confident I'm doing it correctly without spending a while writing code to load it successfully. But, that should be enough to check your classification accuracy, and get accuracy and loss on TensorBoard. There should be a built-in accuracy you can use instead of the three-line function. Check tf.metrics.accuracy.

Frankly, looking at your previous code post, I'm not sure why the summaries, both images and loss scalars, aren't showing up on TensorBoard. Sometimes refreshing TensorBoard helps... Obviously you initialized train_writer with the right TFR_DIR if you're seeing the graph... Perhaps you forgot train_writer.add_summary? Like I said, post the current code and I'll have a look.

Hope this helps. Sorry for the lack of concrete answers, but this should get you going on accuracy and maybe data augmentation. As far as you understanding loss, it's referred to as cost and error interchangeably (and more commonly error) in CNN literature. Cost appears more often in RNN literature in my experience. I don't know if I've read "loss" anywhere besides TensorFlow and I'm not sure where the term comes from.

EDIT: I owe you an explanation of how the accuracy function works. labels and logits are of size (batch_size, NUM_CLASSES). Labels should be a Tensor of batch_size one-hot arrays, and logits batch_size raw model predictions. Take the index of the largest value for each batch example. That is, argmax over dimension 1, the NUM_CLASSES axis. Then use tf.equal to see if the predicted class matches with the label class integer for each example in the batch. Then take the average over the batch.

It's pretty clear the index of the highest element in each of those one-hot arrays in label is the integer class label, just zero-indexed. Logits is the same way. For each example in the batch, there's an array of length NUM_CLASSES holding raw model prediction. An aside: Although logits isn't a probability distribution, (I'm sure we could prove that) applying softmax to each of the raw prediction arrays we have for each example in the batch make it a probability distribution will NOT change which entry is largest.

barbansonw commented 6 years ago

Hi,

the code so far:

import tensorflow as tf
import numpy as np
from scipy.misc import imread
import cv2
from cv2 import resize

image_list_file = 'C:/Users/dir/dir/TensorFlow/paths.txt'
TFR_DIR = 'C:/Users/dir/dir/TensorFlow/Data/TFRecords/Test_tfrecord.tfrecord'
MODEL_DIR = 'C:/Users/dir/dir/TensorFlow/Data/Models/Test_IR'
SUMMARY_DIR = 'C:/Users/dir/dir/TensorFlow/Data/Summaries'
NUM_CLASSES = 42
MODEL_INPUT_ROWS = MODEL_INPUT_COLS = 128
MODEL_INPUT_CHANNELS = 3

def read_labeled_image_list(image_list_file):
    f = open(image_list_file, 'r')
    all_image_paths = []
    all_labels = []
    for line in f:
        filename, label = line[:-1].split(' ')
        all_image_paths.append(filename)
        all_labels.append(int(label))
    return all_image_paths, all_labels

def make_tfr(tfr_dir=TFR_DIR):
    def _int64_feature(arg):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=[arg]))

    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

    writer = tf.python_io.TFRecordWriter(TFR_DIR)
    all_image_paths, all_labels = read_labeled_image_list(image_list_file)
    for path, label in zip(all_image_paths, all_labels):
        disk_im = imread(path)
        resized_im = resize(disk_im, (MODEL_INPUT_COLS, MODEL_INPUT_ROWS))
        raw_im = resized_im.tostring()
        example = tf.train.Example(
            features=tf.train.Features(
                feature={
                    'image_raw': _bytes_feature(raw_im),
                    'label': _int64_feature(label)}))
        serialized = example.SerializeToString()
        writer.write(serialized)

#make_tfr()

def input_pipeline(batch_size, epochs, tfr_dir=TFR_DIR):
    with tf.name_scope("input"):
        dataset = tf.data.TFRecordDataset(TFR_DIR)

    def parse_protocol_buffer(example_proto):
        features = {
            'image_raw': tf.FixedLenFeature((), tf.string), 
            'label': tf.FixedLenFeature((), tf.int64)}
        parsed_features = tf.parse_single_example(example_proto, features)
        return parsed_features['image_raw'], parsed_features['label']

    dataset = dataset.map(parse_protocol_buffer)

    def convert_parsed_proto_to_input(image_string, label):
        image_decoded = tf.decode_raw(image_string, tf.uint8)
        image_resized = tf.reshape(image_decoded, (MODEL_INPUT_ROWS, MODEL_INPUT_COLS, MODEL_INPUT_CHANNELS))
        image = tf.cast(image_resized, tf.float32)
        label = tf.one_hot(label, NUM_CLASSES)
        return image * (2. /255) -1, label

    def augmentation(image, label):
        image = tf.image.random_brightness(image, 0.6)
        image = tf.image.random_saturation(image, lower=0.6, upper=1.4)
        image = tf.image.random_flip_left_right(image)
        image = tf.image.random_flip_up_down(image)
        image = tf.image.random_contrast(image, lower=0.6, upper=1.4)
        return image, label

    dataset = dataset.map(convert_parsed_proto_to_input)
    #dataset = dataset.map(augmentation) #optional image transformation
    dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.repeat(batch_size * epochs)
    dataset = dataset.batch(batch_size)
    dataset = dataset.repeat(epochs)
    return dataset

#------------------------------------------------------------------------------------------------------------------------------

def conv(imgs, filters_out, stride_size, kernel_size):
    filters_in = imgs.get_shape().as_list()[3]
    Kernel = tf.get_variable(
        "kernel",
        [kernel_size[0], kernel_size[1], filters_in, filters_out],
        initializer=tf.truncated_normal_initializer(stddev=0.1))
    Bias = tf.get_variable("bias", [filters_out], initializer=tf.zeros_initializer())
    evidence = tf.nn.conv2d(
        imgs, 
        Kernel, 
        strides=[1, stride_size[0], stride_size[1], 1], 
        padding="SAME")
    return evidence + Bias

def model(image_tensor, keep_prob=0.5):
    tf.summary.image("img_summary", batch_image)
    with tf.variable_scope("layer1"):
        z1 = conv(image_tensor, 16, (2, 2), (5, 5))
        a1 = tf.nn.relu(z1)
    with tf.variable_scope("layer2"):
        z2 = conv(a1, 32, (2, 2), (5, 5))
        a2 = tf.nn.relu(z2)
    m2 = tf.nn.max_pool(a2, [1, 2, 2, 1], [1, 2, 2, 1], "SAME")
    with tf.variable_scope("layer3"):
        z3 = conv(m2, 64, (2, 2), (5, 5))
        a3 = tf.nn.relu(z3)
    with tf.variable_scope("layer4"):
        z4 = conv(a3, 128, (2, 2), (5, 5))
        a4 = tf.nn.relu(z4)
    m4 = tf.nn.max_pool(a4, [1, 2, 2, 1], [1, 2, 2, 1], "SAME")
    with tf.variable_scope("layer5"):
        z5 = conv(m4, 256, (2, 2), (5, 5))
        a5 = tf.nn.relu(z5)
    with tf.variable_scope("layer6"):
        z6 = conv(a5, 256, (2, 2), (5, 5))
        a6 = tf.nn.relu(z6)
    m6 = tf.nn.max_pool(a6, [1, 2, 2, 1], [1, 2, 2, 1], "SAME")
    with tf.variable_scope("layer7"):
        z7 = conv(m6, 128, (2, 2), (5, 5))
        a7 = tf.nn.relu(z7)
    with tf.variable_scope("layer8"):
        z8 = conv(a7, 64, (2, 2), (5, 5))
        a8 = tf.nn.relu(z8)

    final_shape = a8.get_shape().as_list()
    n_elems = final_shape[1] * final_shape[2] * final_shape[3]
    flat_a8 = tf.reshape(a8, [-1, n_elems])
    d8 = tf.nn.dropout(flat_a8, keep_prob)
    with tf.variable_scope("weight_and_bias"):
        W = tf.get_variable("weights", [n_elems, NUM_CLASSES],
            initializer=tf.truncated_normal_initializer(stddev=0.1))
        b = tf.get_variable("bias", [NUM_CLASSES],
            initializer=tf.zeros_initializer())
        logits = tf.matmul(d8, W) + b
    return logits

def loss(logits=None, labels=None):
    with tf.name_scope("Eval"):
        xent = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels, dim=-1)
        avg_xent = tf.reduce_mean(xent, axis=-1)
    return avg_xent

def train(some_loss):
    train_op = tf.train.AdadeltaOptimizer(0.1).minimize(some_loss)
    return train_op

#------------------------------------------------------------------------------------------------------------------------------

batch_size = 50
iterations = 1000

train_dataset = input_pipeline(batch_size, iterations, tfr_dir=TFR_DIR)

with tf.name_scope("Input"):
    train_iterator = train_dataset.make_initializable_iterator()
    batch_image, batch_label = train_iterator.get_next()

tf.summary.image("inputs", batch_image, 1)

keep_prob = tf.placeholder(tf.float32)

tf.summary.scalar("keep_prob", keep_prob)

logits = model(batch_image, keep_prob=keep_prob)

loss_op = loss(logits=logits, labels=batch_label)

tf.summary.scalar("loss_op", loss_op)

train_op = train(loss_op)

summary_op = tf.summary.merge_all()

#------------------------------------------------------------------------------------------------------------------------------

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter(TFR_DIR, sess.graph)
    sess.run(train_iterator.initializer)
    for epoch in range(iterations + 1):
        check_in = epoch % 100 == 0
        feed_dict = {keep_prob: 0.5}
        _ = sess.run([train_op], feed_dict=feed_dict) 
        if check_in:
            curr_loss = sess.run([loss_op], feed_dict=feed_dict)
            print(epoch, curr_loss)
    saver = tf.train.Saver()
    save_path = saver.save(sess, MODEL_DIR)
    print("Model saved in file: %s" % save_path)
    train_writer.close()

I didn't include anything that didn't work for me, this is just the working code as is. I've played with the amount of layers in model, seeing loss this works pretty good for me without too much images(I've made extra pictures though). I'll be trying to get the amount of images way up asap because now I have working code I need something I can actually train it on.

On a note, train_writer = tf.summary.FileWriter(TFR_DIR, sess.graph) doesn't work for me, I made a separate folder called summaries(SUMMARY_DIR) which makes tensorboard display my graph. I'm not entirely sure that TFR_DIR should actually work).

There's probably still some things here I missed that don't work 100% like they should but so far I'm pretty happy with the results.

flauted commented 6 years ago

It's very late and I'm just now getting to replying to this, so sorry I don't have much in-depth to contribute today. I would be surprised if train_writer = tf.summary.Filewriter(TFR_DIR, sess.graph) works. TFR_DIR is where your dataset lives. It isn't where your TensorBoard data should be. I hope I didn't lead you to using TFR_DIR instead of something else! Sorry if I did!

Anyways, this works on my system:

# -----------------------------------------------------------------------------

TB_DIR = '/home/dylan/dir/dir/TensorFlow/TB/project'
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter(TB_DIR, sess.graph)
    sess.run(train_iterator.initializer)
    for epoch in range(iterations + 1):
        check_in = epoch % 100 == 0
        feed_dict = {keep_prob: 0.5}
        _, summ = sess.run([train_op, summary_op], feed_dict=feed_dict)
        train_writer.add_summary(summ, epoch)
        if check_in:
            curr_loss = sess.run([loss_op], feed_dict=feed_dict)
            print(epoch, curr_loss)
    saver = tf.train.Saver()
    save_path = saver.save(sess, MODEL_DIR)
    print("Model saved in file: %s" % save_path)
    train_writer.close()

My TB_DIR is probably what you were working at with SUMMARY_DIR. The code with this training hook should have the side-effect of making the TB directory and the project directory. Then (from the TensorFlow directory) you run TensorBoard with tensorboard --logdir=TB/project. Should work. Don't fret if keep_prob has a funny-looking graph. Turn smoothing off and it'll be a straight line. This is a recent bug - I don't remember it happening with v 1.3 or 1.2.

At least I can confirm this worked on my system with no other changes. Hope this helps!

barbansonw commented 6 years ago

Yes this works! sorry for the slow reply, holidays and all...(happy ny by the way)

Allright I'm going to try to build in a graph for accuracy as a next step I think. I've upscaled my dataset to around 500 images which works way better so I'm glad to see everything's coming together!

flauted commented 6 years ago

Happy New Years! Glad to know all is going well and happy to hear that you're still working on your project. Let me know if you run into any bugs I can help you with.

barbansonw commented 6 years ago

Hi there!

Well I've seen t-sne graphs and i thought those looked really well and made it clear what images get clustered together. I'd really like to be able to visualise that, do you know if this is hard to do?

flauted commented 6 years ago

Do I know if this is hard to do? The tutorials looked vague enough that I never did it myself, but that was many months ago. It doesn't look that hard.

That also answers if I know how to do it (no). But I know where to point you to get started: Embeddings and TensorBoard: Embedding Visualization. The latter is a link to r0.12 docs, so no promises it's up to date anymore. I link it because it has links that may be useful.

barbansonw commented 6 years ago

shiet

I've got most things working but I keep running into this happening. Eventually my loss becomes extremely high and accuracy goes (close) to zero. Is there a reason this keeps happening? I've read that people say this is the fault off a calculating-error for loss. Do you have any idea what's going wrong?

EDIT - and it was working so well I've gotten some pretty good results so far :(

flauted commented 6 years ago

No, I don't know what's going on there. I guess it could be floating point error. Did you shuffle your data? Do the images still look right on the TensorBoard dash?