Model Accuracy - Githubissues

brunolubascher commented 7 years ago

How are we going to measure the accuracy of our models?

At the moment, our accuracy measure is a strict equality. This means that the predicted_label can be either correct or wrong. In the case where the true_label = (0.8, 0.2, 0.0, 0.0, 0.0, 0.0) and our model predicted_label = (0.78, 0.18, 0.01, 0.01, 0.01, 0.01), it will be consideres a mismatch.

However, I do not believe this is a good way of measuring accuracy given our labels floating point type.

Maybe we could then use some kind of error measure between the predicted_label and the true_label, so we can compare our models.

Any suggestions or ideas on this?

jerryspan commented 7 years ago

Initially, i thought of going with something like categorical cross-entropy, but as you formed the problem now, I am not sure whether it's a good match, because that is applicable to classification and as you have the output now (continuous numbers).

Perhaps you can try some a simple loss, like MSE (mean squared error)

https://www.tensorflow.org/versions/r0.11/api_docs/python/nn/losses

Naxter commented 7 years ago

Yes I guess this will get us a better result for the accuracy. I am going to incorporate that.

brunolubascher commented 7 years ago

Here are results for always giving an output in the form [1.0, 0.0, 0.0, 0.0, 0.0, 0.0], i.e. predicting only likes. We can use this to compare with the results of our networks.

I split the Sainsbury data, where different thresholds for the minimum number of reactions to a post were used. The total number of posts was 10,307, but many contain no reactions at all.

At least 1 reaction (3,470 posts): Mean absolute error: 0.0428 Mean squared error: 0.0337
At least 2 reactions (1,606 posts): Mean absolute error: 0.0423 Mean squared error: 0.0226
At least 5 reactions (530 posts): Mean absolute error: 0.0389 Mean squared error: 0.0142
At least 10 reactions (253 posts): Mean absolute error: 0.0320 Mean squared error: 0.0096
At least 15 reactions (176 posts): Mean absolute error: 0.0269 Mean squared error: 0.0057

Naxter commented 7 years ago

What error is it? While training? Or when evaluating? And how long have you been training?

brunolubascher commented 7 years ago

This is not a model that needs to be trained. These results are just to be used as a benchmark since it would be really strange to get a larger error than this.

I only calculated the error of always predicting likes, over the whole filtered dataset.

You can re-run the experiment with the code below.

from sklearn import metrics
from importer.data_importer import DataImporter

# Load the data using our DataImporter
importer_sainsbury = DataImporter("../../data/Filtered/Sainsbury.zip", 
                                  "../../../data/Unzipped/Sainsbury")
importer_sainsbury.load()
x, y = importer_sainsbury.get_data_and_labels()

def normalize_reactions(Y):
    ''' Transform absolute reactions into ratios. '''
    norm_Y = []
    for y in Y:
        t = np.sum(y)
        reac = []
        for r in y:
            reac.append(r/t)
        norm_Y.append(reac)
    return norm_Y

def get_minimum_reactions(X, Y, min_reactions):
    ''' Filter the data based on the minimum total reactions. '''
    ret_X = []
    ret_Y = []
    for i, y in enumerate(Y):
        if np.sum(y) >= min_reactions:
            ret_X.append(X[i])
            ret_Y.append(Y[i])
    return ret_X, normalize_reactions(ret_Y)

# Filter the data with different thesholds
X_1, Y_1 = get_minimum_reactions(x, y, 1)
X_2, Y_2 = get_minimum_reactions(x, y, 2)
X_5, Y_5 = get_minimum_reactions(x, y, 5)
X_10, Y_10 = get_minimum_reactions(x, y, 10)
X_15, Y_15 = get_minimum_reactions(x, y, 15)

# Check the mean error over the whole filtered dataset
for Y in [Y_1, Y_2, Y_5, Y_10, Y_15]:
    pred = [[1.0, 0.0, 0.0, 0.0, 0.0, 0.0] for each in Y]
    mae = metrics.mean_absolute_error(Y, pred)
    mse = metrics.mean_squared_error(Y, pred)

    print('\nMean absolute error: {}'.format(mae))
    print('Mean squared error: {}'.format(mse))

brunolubascher commented 7 years ago

With the RNN I used the filtered data as above but had it split 0.9/0.1 for training/testing.

The setup of the RNN was the following:

parameter	value
lstm_size	128
lstm_layers	1
learning_rate	0.01

I trained for 20 epochs, and got the following results for the different filtering thresholds:

min_reactions	RNN-mae	Baseline-mae	RNN-mse	Baseline-mse
1	0.0421	0.0428	0.0330	0.0337
2	0.0295	0.0423	0.0128	0.0226
5	0.0405	0.0389	0.0159	0.0142
10	0.0302	0.0320	0.0076	0.0096
15	0.0063	0.0269	0.0004	0.0057

The RNN scores less error than the baseline, except for the filtered data with at least 5 reactions.

Naxter commented 7 years ago

I tested the CNN with the following parameters for the Sainsbury dataset:

parameter	value
epochs	20
filter_sizes	3,4,5
filter_count per filter	40
embedding_dimension	50
dropout_keep_probability	0.5

min_reactions	CNN-mae	Baseline-mae	CNN-mse	Baseline-mse	Cross entropy loss
1	0.0311	0.0428	0.0205	0.0337	1.16739
2	0.0670	0.0423	0.0412	0.0226	1.1421
5	0.0461	0.0389	0.0227	0.0142	1.13975
10	0.0448	0.0320	0.0134	0.0096	1.11918
15	0.0248	0.0269	0.0052	0.0057	1.1359

The CNN is at the moment performing really bad. Just with the filtered data with at least 1 reaction, the network can outperform the RNN. I guess it is too less training data for a CNN in the other cases. Still need to try to combine tesco and sainsbury and also include the new collected data.

Naxter commented 7 years ago

Sainsbury + Tesco dataset comparing "Raw input vs pre-processed input":

min_reactions	CNN-mae raw	CNN-mae pre-processed	CNN-mse raw	CNN-mse pre-processed
1	0.0305	0.0513	0.0226	0.0409
2	0.0222	0.0348	0.0180	0.0108
5	0.0485	0.0111	0.0292	0.0016
10	0.0317	0.0568	0.0130	0.0255
15	0.0215	0.0202	0.0038	0.0046

Naxter commented 7 years ago

Sainsbury + Tesco dataset comparing "Additional data that has been crawled":

min_reactions	CNN-mae	CNN-mae additonal data	CNN-mse	CNN-mse additonal data
5	0.0485	0.0244	0.0292	0.0034
10	0.0317	0.0192	0.0130	0.0039
15	0.0215	0.0232	0.0038	0.0049

Naxter commented 7 years ago

Sainsbury + Tesco dataset comparing "without likes":

min_reactions	CNN-mae	CNN-mae without likes"	CNN-mse	CNN-mse without likes"
5	0.0485	0.1129	0.0292	0.0703
10	0.0317	0.0906	0.0130	0.0679
15	0.0215	0.1209	0.0038	0.0875

Without likes approach is really bad. Might be because it is of course much less data there with 15/10/5 reactions at all WITHOUT the like because most of the posts contain mostly likes.

brunolubascher commented 7 years ago

Sainsbury + Tesco dataset comparing "Raw input vs pre-processed input":

min_reactions	RNN-mae raw	RNN-mae pre-processed	RNN-mse raw	RNN-mse pre-processed
1	0.0426	0.0360	0.0334	0.0262
2	0.0409	0.0417	0.0226	0.0220
5	0.0369	0.0407	0.0154	0.0150
10	0.0312	0.0274	0.0096	0.0085
15	0.0308	0.0259	0.0078	0.0052

PythEsc commented 7 years ago

Are we still working on this one or is it done with the results presented above?

PythEsc / Research_project2

Model Accuracy #16