angeloskath / supervised-lda

A flexible variational inference LDA library.
MIT License
22 stars 5 forks source link

Installation: 1/49 test fails; RAM issues with large dataset #19

Closed richardknudsen closed 6 years ago

richardknudsen commented 6 years ago

Hi, thanks for the nice library and the documentation.

I have two issues that may or may not be related. Sorry in advance for any stupid questions, my understanding of C++ is zero.

1) When running "make check" on my system (Ubuntu 16.04.4 LTS on an AWS c5.xlarge instance, 4GB ram, 4 cores), the following 1/49 tests fails:

[ FAILED ] TestSecondOrderMultinomialLogisticRegression/1.MinimizerOverfitSmall, where TypeParam = double (51 ms) [----------] 2 tests from TestSecondOrderMultinomialLogisticRegression/1 (52 ms total)

[----------] Global test environment tear-down [==========] 49 tests from 25 test cases ran. (629 ms total) [ PASSED ] 48 tests. [ FAILED ] 1 test, listed below: [ FAILED ] TestSecondOrderMultinomialLogisticRegression/1.MinimizerOverfitSmall, where TypeParam = double

I have tried reinstalling everything multiple times, the error persists unfortunately.

2) Running "fslda online_train" with default parameters through the console application for a small dataset (n_documents=5,000) fails with the following message:

E-M Iteration 1 100 log p(y | \bar{z}, eta): nan Segmentation fault (core dumped)

Running "fslda train" on the same data runs fine. However, when I increase the input to a larger chunk of n_documents=100,000 (full dataset is 1,400,000), this also fails (dat.npy has size 3.5GB in this case):

Segmentation fault (core dumped)

Any help is greatly appreciated. Thanks a lot, Richard

richardknudsen commented 6 years ago

Found the issue, my inputs for "y" contained non-sensible values (pd intervals), and no categorical values 0,1,...