Closed brunolubascher closed 7 years ago
Initially, i thought of going with something like categorical cross-entropy, but as you formed the problem now, I am not sure whether it's a good match, because that is applicable to classification and as you have the output now (continuous numbers).
Perhaps you can try some a simple loss, like MSE (mean squared error)
https://www.tensorflow.org/versions/r0.11/api_docs/python/nn/losses
Yes I guess this will get us a better result for the accuracy. I am going to incorporate that.
Here are results for always giving an output in the form [1.0, 0.0, 0.0, 0.0, 0.0, 0.0], i.e. predicting only likes. We can use this to compare with the results of our networks.
I split the Sainsbury data, where different thresholds for the minimum number of reactions to a post were used. The total number of posts was 10,307, but many contain no reactions at all.
At least 1 reaction (3,470 posts): Mean absolute error: 0.0428 Mean squared error: 0.0337
At least 2 reactions (1,606 posts): Mean absolute error: 0.0423 Mean squared error: 0.0226
At least 5 reactions (530 posts): Mean absolute error: 0.0389 Mean squared error: 0.0142
At least 10 reactions (253 posts): Mean absolute error: 0.0320 Mean squared error: 0.0096
At least 15 reactions (176 posts): Mean absolute error: 0.0269 Mean squared error: 0.0057
What error is it? While training? Or when evaluating? And how long have you been training?
This is not a model that needs to be trained. These results are just to be used as a benchmark since it would be really strange to get a larger error than this.
I only calculated the error of always predicting likes, over the whole filtered dataset.
You can re-run the experiment with the code below.
from sklearn import metrics
from importer.data_importer import DataImporter
# Load the data using our DataImporter
importer_sainsbury = DataImporter("../../data/Filtered/Sainsbury.zip",
"../../../data/Unzipped/Sainsbury")
importer_sainsbury.load()
x, y = importer_sainsbury.get_data_and_labels()
def normalize_reactions(Y):
''' Transform absolute reactions into ratios. '''
norm_Y = []
for y in Y:
t = np.sum(y)
reac = []
for r in y:
reac.append(r/t)
norm_Y.append(reac)
return norm_Y
def get_minimum_reactions(X, Y, min_reactions):
''' Filter the data based on the minimum total reactions. '''
ret_X = []
ret_Y = []
for i, y in enumerate(Y):
if np.sum(y) >= min_reactions:
ret_X.append(X[i])
ret_Y.append(Y[i])
return ret_X, normalize_reactions(ret_Y)
# Filter the data with different thesholds
X_1, Y_1 = get_minimum_reactions(x, y, 1)
X_2, Y_2 = get_minimum_reactions(x, y, 2)
X_5, Y_5 = get_minimum_reactions(x, y, 5)
X_10, Y_10 = get_minimum_reactions(x, y, 10)
X_15, Y_15 = get_minimum_reactions(x, y, 15)
# Check the mean error over the whole filtered dataset
for Y in [Y_1, Y_2, Y_5, Y_10, Y_15]:
pred = [[1.0, 0.0, 0.0, 0.0, 0.0, 0.0] for each in Y]
mae = metrics.mean_absolute_error(Y, pred)
mse = metrics.mean_squared_error(Y, pred)
print('\nMean absolute error: {}'.format(mae))
print('Mean squared error: {}'.format(mse))
With the RNN I used the filtered data as above but had it split 0.9/0.1 for training/testing.
The setup of the RNN was the following:
parameter | value |
---|---|
lstm_size | 128 |
lstm_layers | 1 |
learning_rate | 0.01 |
I trained for 20 epochs, and got the following results for the different filtering thresholds:
min_reactions | RNN-mae | Baseline-mae | RNN-mse | Baseline-mse |
---|---|---|---|---|
1 | 0.0421 | 0.0428 | 0.0330 | 0.0337 |
2 | 0.0295 | 0.0423 | 0.0128 | 0.0226 |
5 | 0.0405 | 0.0389 | 0.0159 | 0.0142 |
10 | 0.0302 | 0.0320 | 0.0076 | 0.0096 |
15 | 0.0063 | 0.0269 | 0.0004 | 0.0057 |
The RNN scores less error than the baseline, except for the filtered data with at least 5 reactions.
I tested the CNN with the following parameters for the Sainsbury dataset:
parameter | value |
---|---|
epochs | 20 |
filter_sizes | 3,4,5 |
filter_count per filter | 40 |
embedding_dimension | 50 |
dropout_keep_probability | 0.5 |
min_reactions | CNN-mae | Baseline-mae | CNN-mse | Baseline-mse | Cross entropy loss |
---|---|---|---|---|---|
1 | 0.0311 | 0.0428 | 0.0205 | 0.0337 | 1.16739 |
2 | 0.0670 | 0.0423 | 0.0412 | 0.0226 | 1.1421 |
5 | 0.0461 | 0.0389 | 0.0227 | 0.0142 | 1.13975 |
10 | 0.0448 | 0.0320 | 0.0134 | 0.0096 | 1.11918 |
15 | 0.0248 | 0.0269 | 0.0052 | 0.0057 | 1.1359 |
The CNN is at the moment performing really bad. Just with the filtered data with at least 1 reaction, the network can outperform the RNN. I guess it is too less training data for a CNN in the other cases. Still need to try to combine tesco and sainsbury and also include the new collected data.
Sainsbury + Tesco dataset comparing "Raw input vs pre-processed input":
min_reactions | CNN-mae raw | CNN-mae pre-processed | CNN-mse raw | CNN-mse pre-processed |
---|---|---|---|---|
1 | 0.0305 | 0.0513 | 0.0226 | 0.0409 |
2 | 0.0222 | 0.0348 | 0.0180 | 0.0108 |
5 | 0.0485 | 0.0111 | 0.0292 | 0.0016 |
10 | 0.0317 | 0.0568 | 0.0130 | 0.0255 |
15 | 0.0215 | 0.0202 | 0.0038 | 0.0046 |
Sainsbury + Tesco dataset comparing "Additional data that has been crawled":
min_reactions | CNN-mae | CNN-mae additonal data | CNN-mse | CNN-mse additonal data |
---|---|---|---|---|
5 | 0.0485 | 0.0244 | 0.0292 | 0.0034 |
10 | 0.0317 | 0.0192 | 0.0130 | 0.0039 |
15 | 0.0215 | 0.0232 | 0.0038 | 0.0049 |
Sainsbury + Tesco dataset comparing "without likes":
min_reactions | CNN-mae | CNN-mae without likes" | CNN-mse | CNN-mse without likes" |
---|---|---|---|---|
5 | 0.0485 | 0.1129 | 0.0292 | 0.0703 |
10 | 0.0317 | 0.0906 | 0.0130 | 0.0679 |
15 | 0.0215 | 0.1209 | 0.0038 | 0.0875 |
Without likes approach is really bad. Might be because it is of course much less data there with 15/10/5 reactions at all WITHOUT the like because most of the posts contain mostly likes.
Sainsbury + Tesco dataset comparing "Raw input vs pre-processed input":
min_reactions | RNN-mae raw | RNN-mae pre-processed | RNN-mse raw | RNN-mse pre-processed |
---|---|---|---|---|
1 | 0.0426 | 0.0360 | 0.0334 | 0.0262 |
2 | 0.0409 | 0.0417 | 0.0226 | 0.0220 |
5 | 0.0369 | 0.0407 | 0.0154 | 0.0150 |
10 | 0.0312 | 0.0274 | 0.0096 | 0.0085 |
15 | 0.0308 | 0.0259 | 0.0078 | 0.0052 |
Are we still working on this one or is it done with the results presented above?
How are we going to measure the accuracy of our models?
At the moment, our accuracy measure is a strict equality. This means that the
predicted_label
can be either correct or wrong. In the case where thetrue_label = (0.8, 0.2, 0.0, 0.0, 0.0, 0.0)
and our modelpredicted_label = (0.78, 0.18, 0.01, 0.01, 0.01, 0.01)
, it will be consideres a mismatch.However, I do not believe this is a good way of measuring accuracy given our labels floating point type.
Maybe we could then use some kind of error measure between the
predicted_label
and thetrue_label
, so we can compare our models.Any suggestions or ideas on this?