Closed nbosc closed 5 years ago
Hi train.shape and test.shape should be equal. Can you check?
Hi, Train and test had different shapes because I wad doing a 80/20 split. Changing to 50/50 solves the problem, thanks. I don't get why the shapes have to be identical, could you explain or tell me where to look for?
In most cases train and test are sparse matrices (scipy.sparse), they have different non-zero elements, but their number of rows and columns should be equal.
You could have a dense train matrix, but then your test set is always going to overlap with your train set.
It's OK to binarize the matrix upfront, but do check what threshold SMURFF is using. This is printed at the beginning of sampling.
Ok, thanks for the precision. It seems to work. My binary pre-computed values are -1. and 1. so I use a 0. threshold. I get this:
Result: {
Test data: 168002 [92557 x 572] (0.32%)
Binary classification threshold: 0.00
40.68% positives in test data
Looks ok.
In your example you show how to factorise a binary matrix but you actually binarise the matrix during the factorisation. Because I apply different thresholds depending on the data, as far as I know I cannot use
smurff.ProbitNoise
. Therefore I have already precomputed my binary matrix and now would like to run the factorisation but I get an error doing:Which is: