PythEsc / Research_project2

Prediction of Facebook-user reactions to supermarkets using neural networks
1 stars 0 forks source link

Model Accuracy #16

Closed brunolubascher closed 7 years ago

brunolubascher commented 7 years ago

How are we going to measure the accuracy of our models?

At the moment, our accuracy measure is a strict equality. This means that the predicted_label can be either correct or wrong. In the case where the true_label = (0.8, 0.2, 0.0, 0.0, 0.0, 0.0) and our model predicted_label = (0.78, 0.18, 0.01, 0.01, 0.01, 0.01), it will be consideres a mismatch.

However, I do not believe this is a good way of measuring accuracy given our labels floating point type.

Maybe we could then use some kind of error measure between the predicted_label and the true_label, so we can compare our models.

Any suggestions or ideas on this?

jerryspan commented 7 years ago

Initially, i thought of going with something like categorical cross-entropy, but as you formed the problem now, I am not sure whether it's a good match, because that is applicable to classification and as you have the output now (continuous numbers).

Perhaps you can try some a simple loss, like MSE (mean squared error)

https://www.tensorflow.org/versions/r0.11/api_docs/python/nn/losses

Naxter commented 7 years ago

Yes I guess this will get us a better result for the accuracy. I am going to incorporate that.

brunolubascher commented 7 years ago

Here are results for always giving an output in the form [1.0, 0.0, 0.0, 0.0, 0.0, 0.0], i.e. predicting only likes. We can use this to compare with the results of our networks.

I split the Sainsbury data, where different thresholds for the minimum number of reactions to a post were used. The total number of posts was 10,307, but many contain no reactions at all.

Naxter commented 7 years ago

What error is it? While training? Or when evaluating? And how long have you been training?

brunolubascher commented 7 years ago

This is not a model that needs to be trained. These results are just to be used as a benchmark since it would be really strange to get a larger error than this.

I only calculated the error of always predicting likes, over the whole filtered dataset.

You can re-run the experiment with the code below.

from sklearn import metrics
from importer.data_importer import DataImporter

# Load the data using our DataImporter
importer_sainsbury = DataImporter("../../data/Filtered/Sainsbury.zip", 
                                  "../../../data/Unzipped/Sainsbury")
importer_sainsbury.load()
x, y = importer_sainsbury.get_data_and_labels()

def normalize_reactions(Y):
    ''' Transform absolute reactions into ratios. '''
    norm_Y = []
    for y in Y:
        t = np.sum(y)
        reac = []
        for r in y:
            reac.append(r/t)
        norm_Y.append(reac)
    return norm_Y

def get_minimum_reactions(X, Y, min_reactions):
    ''' Filter the data based on the minimum total reactions. '''
    ret_X = []
    ret_Y = []
    for i, y in enumerate(Y):
        if np.sum(y) >= min_reactions:
            ret_X.append(X[i])
            ret_Y.append(Y[i])
    return ret_X, normalize_reactions(ret_Y)

# Filter the data with different thesholds
X_1, Y_1 = get_minimum_reactions(x, y, 1)
X_2, Y_2 = get_minimum_reactions(x, y, 2)
X_5, Y_5 = get_minimum_reactions(x, y, 5)
X_10, Y_10 = get_minimum_reactions(x, y, 10)
X_15, Y_15 = get_minimum_reactions(x, y, 15)

# Check the mean error over the whole filtered dataset
for Y in [Y_1, Y_2, Y_5, Y_10, Y_15]:
    pred = [[1.0, 0.0, 0.0, 0.0, 0.0, 0.0] for each in Y]
    mae = metrics.mean_absolute_error(Y, pred)
    mse = metrics.mean_squared_error(Y, pred)

    print('\nMean absolute error: {}'.format(mae))
    print('Mean squared error: {}'.format(mse))
brunolubascher commented 7 years ago

With the RNN I used the filtered data as above but had it split 0.9/0.1 for training/testing.

The setup of the RNN was the following:

parameter value
lstm_size 128
lstm_layers 1
learning_rate 0.01

I trained for 20 epochs, and got the following results for the different filtering thresholds:

min_reactions RNN-mae Baseline-mae RNN-mse Baseline-mse
1 0.0421 0.0428 0.0330 0.0337
2 0.0295 0.0423 0.0128 0.0226
5 0.0405 0.0389 0.0159 0.0142
10 0.0302 0.0320 0.0076 0.0096
15 0.0063 0.0269 0.0004 0.0057

The RNN scores less error than the baseline, except for the filtered data with at least 5 reactions.

Naxter commented 7 years ago

I tested the CNN with the following parameters for the Sainsbury dataset:

parameter value
epochs 20
filter_sizes 3,4,5
filter_count per filter 40
embedding_dimension 50
dropout_keep_probability 0.5
min_reactions CNN-mae Baseline-mae CNN-mse Baseline-mse Cross entropy loss
1 0.0311 0.0428 0.0205 0.0337 1.16739
2 0.0670 0.0423 0.0412 0.0226 1.1421
5 0.0461 0.0389 0.0227 0.0142 1.13975
10 0.0448 0.0320 0.0134 0.0096 1.11918
15 0.0248 0.0269 0.0052 0.0057 1.1359

The CNN is at the moment performing really bad. Just with the filtered data with at least 1 reaction, the network can outperform the RNN. I guess it is too less training data for a CNN in the other cases. Still need to try to combine tesco and sainsbury and also include the new collected data.

Naxter commented 7 years ago

Sainsbury + Tesco dataset comparing "Raw input vs pre-processed input":

min_reactions CNN-mae raw CNN-mae pre-processed CNN-mse raw CNN-mse pre-processed
1 0.0305 0.0513 0.0226 0.0409
2 0.0222 0.0348 0.0180 0.0108
5 0.0485 0.0111 0.0292 0.0016
10 0.0317 0.0568 0.0130 0.0255
15 0.0215 0.0202 0.0038 0.0046
Naxter commented 7 years ago

Sainsbury + Tesco dataset comparing "Additional data that has been crawled":

min_reactions CNN-mae CNN-mae additonal data CNN-mse CNN-mse additonal data
5 0.0485 0.0244 0.0292 0.0034
10 0.0317 0.0192 0.0130 0.0039
15 0.0215 0.0232 0.0038 0.0049
Naxter commented 7 years ago

Sainsbury + Tesco dataset comparing "without likes":

min_reactions CNN-mae CNN-mae without likes" CNN-mse CNN-mse without likes"
5 0.0485 0.1129 0.0292 0.0703
10 0.0317 0.0906 0.0130 0.0679
15 0.0215 0.1209 0.0038 0.0875

Without likes approach is really bad. Might be because it is of course much less data there with 15/10/5 reactions at all WITHOUT the like because most of the posts contain mostly likes.

brunolubascher commented 7 years ago

Sainsbury + Tesco dataset comparing "Raw input vs pre-processed input":

min_reactions RNN-mae raw RNN-mae pre-processed RNN-mse raw RNN-mse pre-processed
1 0.0426 0.0360 0.0334 0.0262
2 0.0409 0.0417 0.0226 0.0220
5 0.0369 0.0407 0.0154 0.0150
10 0.0312 0.0274 0.0096 0.0085
15 0.0308 0.0259 0.0078 0.0052
PythEsc commented 7 years ago

Are we still working on this one or is it done with the results presented above?