Closed nbosc closed 3 years ago
Hi, instead of using Method: CG Solver with tolerance: 1.00e-06
, try using the direct inversion method.
Set direct=True
in addSideInfo or in MacauSession
Still working on macOS, fails at the same stage on Linux
Using OpenMP with up to 6 threads.
PythonSession {
Data: {
Type: ScarceMatrixData [with NAs]
Component-wise mean: -0.950016
Component-wise variance: 0.58482
Noise: Probit Noise with threshold 0
Size: 12644 [500 x 100] (25.29%)
Warning: 11 empty cols
}
Model: {
Num-latents: 32
}
Priors: {
0: MacauPrior
SideInfo: DenseDouble [500, 1030]
Method: Cholesky Decomposition
BetaPrecision: fixed at 5.00
1: NormalPrior
}
Result: {
Test data: 12645 [500 x 100] (25.29%)
Binary classification threshold: 0.00
2.39% positives in test data
}
Config: {
Iterations: 40 burnin + 100 samples
Save model: every 5 iteration
Save prefix: /scratch/tmp6q0d7stj/
Save extension: .ddm
}
}
====== Initial phase ======
Initial 0/ 0: RMSE: nan (1samp: nan) U:[0.00e+00, 0.00e+00, ] [took: 0.0s, total: 0.0s]
====== Sampling (burning phase) ======
Burnin 1/ 40: RMSE: nan (1samp: 2.6843) AUC:nan (1samp: 0.4812) U:[4.00e+01, 1.65e+01, ] [took: 0.2s, total: 0.2s]
Burnin 2/ 40: RMSE: nan (1samp: 3.0714) AUC:nan (1samp: 0.5856) U:[4.10e+01, 1.86e+02, ] [took: 0.1s, total: 0.2s]
Burnin 3/ 40: RMSE: nan (1samp: 128.8635) AUC:nan (1samp: 0.6219) U:[4.36e+01, 1.33e+04, ] [took: 0.1s, total: 0.3s]
Burnin 4/ 40: RMSE: nan (1samp: 139.8078) AUC:nan (1samp: 0.4775) U:[4.51e+01, 1.16e+04, ] [took: 0.1s, total: 0.4s]
Burnin 5/ 40: RMSE: nan (1samp: 182799.3654) AUC:nan (1samp: 0.5110) U:[4.42e+01, 2.20e+07, ] [took: 0.1s, total: 0.5s]
Burnin 6/ 40: RMSE: nan (1samp: 1846531.0205) AUC:nan (1samp: 0.4930) U:[4.46e+01, 2.95e+08, ] [took: 0.1s, total: 0.5s]
Burnin 7/ 40: RMSE: nan (1samp: 1363690456.2558) AUC:nan (1samp: 0.5217) U:[4.50e+01, 3.16e+11, ] [took: 0.1s, total: 0.6s]
terminate called recursively
terminate called recursively
terminate called recursively
/lsf/01/1616425828.3919693: line 8: 124386 Aborted python 02_macau_model.py --input_file training_sample_data.pkl
Okay, I think we need more info here on what you are trying to do.
Cheers, Tom
Right. Before I'd like to reiterate that with the same data set and the same version of smurff but a different OS, this works fine. For your questions:
Hi, indeed, if it works on macOS, it should also work on Linux. Maybe the easiest way would be for me to reproduce the problem?
That would be very helpful, thanks. Can you share an email so I can send you the dataset and the script?
Thanks @tvandera .
Apparently there was something wrong in my conda environment and the problem was solved by creating a new one.
Hi,
I try to run a training session with a binary matrix and side info using smurff 0.15.3. I started with a sample of my data on macOS first and it runs smoothly. As I expect the job to last hours with my whole data set, I would like to use a linux cluster. Same version of Smurff is installed but the job ends with a strange error.
On top of that there are warnings that I don't have on macOS.
Maybe the error is linked to the warning but hard to identify where is the issue considering that the same data...