flaviovdf / tribeflow

TribeFlow's source code
http://flaviovdf.github.io/tribeflow
BSD 3-Clause "New" or "Revised" License
31 stars 7 forks source link

AssertionError: File "tribeflow/kernels/eccdf.pyx", line 71, #3

Closed haoransh closed 6 years ago

haoransh commented 7 years ago

I followed all the procedures but to encounter an error:

Split
Merge
Traceback (most recent call last):
  File "main.py", line 135, in <module>
    main()
  File "main.py", line 123, in main
    args.num_batches, True, from_=from_, to=to)
  File "/home/shore/Documents/tribeflow/tribeflow/dynamic.py", line 404, in fit
    kernel.update_state(P)
  File "tribeflow/kernels/eccdf.pyx", line 71, in tribeflow.kernels.eccdf.ECCDFKernel.update_state (tribeflow/kernels/eccdf.c:2733)
    assert P.shape[0] == self.P.shape[0]
AssertionError

But It can run smoothly if I remove the configuration --dynamic True when running main.py.

By the way, it cannot run before I install the pytables manually, with conda install pytables. I'm using python 2.7 indeed.

I wonder how to solve it correctly? Appreciate it very much if anyone can offer any help. Thanks!

flaviovdf commented 7 years ago

Strange, how are you executing the code? Could you share a small trace so that I can reproduce?

This may be a bug I introduced with recent changes.

Please check the WWW branch https://github.com/flaviovdf/tribeflow/releases/tag/v-paper-www if you want to execute things quickly. I'll get back to you asap

haoransh commented 7 years ago

I feel it very strange too, I just follow all the procedures described in the readme. I have tried the WWW branch but encountered similar problem. I have already installed conda as my python environment manager, and my pipeline after I get the source code(WWW branch) is as follows:

conda create -n tribeflow python=2.7
source activate tribeflow
pip install numpy
pip install scipy
pip install cython
pip install pandas
pip install mpi4py
pip install plac
pip install enum34
conda install pytables #if pytables is not installed, pd.HDFStore cannot run correctly.
make
python setup.py install
python scripts/trace_converter.py scripts/test_parser.dat 1 0 2 -d$'\t' -f'%Y-%m-%dT%H:%M:%SZ' > trace.dat
mpiexec -np 20 python main.py trace.dat 100 output.h5 --kernel eccdf --residency_priors 1 99 --dynamic True --leaveout 0.3 --num_iter 2000 --num_batches 20

The error output is

/home/shr/RS/tribeflow-v-paper-www/tribeflow/dynamic.py:164: RuntimeWarning: invalid value encountered in true_divide
  Theta_hz = Theta_hz / Theta_hz.sum(axis=0)
/home/shr/RS/tribeflow-v-paper-www/tribeflow/dynamic.py:168: RuntimeWarning: Degrees of freedom <= 0 for slice
  C = np.cov(Theta_hz.T) + np.cov(Psi_sz.T)
/home/shr/anaconda2/envs/tribeflow/lib/python2.7/site-packages/numpy/lib/function_base.py:2929: RuntimeWarning: divide by zero encountered in double_scalars
  c *= 1. / np.float64(fact)
/home/shr/anaconda2/envs/tribeflow/lib/python2.7/site-packages/numpy/lib/function_base.py:2929: RuntimeWarning: invalid value encountered in multiply
  c *= 1. / np.float64(fact)
Traceback (most recent call last):
  File "main.py", line 135, in <module>
    main()
  File "main.py", line 123, in main
    args.num_batches, True, from_=from_, to=to)
  File "/home/shr/RS/tribeflow-v-paper-www/tribeflow/dynamic.py", line 400, in fit
    kernel.update_state(P)
  File "tribeflow/kernels/eccdf.pyx", line 71, in tribeflow.kernels.eccdf.ECCDFKernel.update_state (tribeflow/kernels/eccdf.c:2733)
    assert P.shape[0] == self.P.shape[0]
AssertionError

I wonder what's wrong with my execution procedures, maybe it's due to some version dismatch between different packages. Here is all the packages installed in the conda-tribeflow environment:

(tribeflow) shr@dlibgpu:~/RS/tribeflow-v-paper-www$ pip list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
Cython (0.25.2)
enum34 (1.1.6)
jsonpickle (0.9.4)
mpi4py (2.0.0)
numexpr (2.6.2)
numpy (1.12.1)
pandas (0.19.2)
pip (9.0.1)
plac (0.9.6)
python-dateutil (2.6.0)
pytz (2017.2)
scipy (0.19.0)
setuptools (27.2.0)
six (1.10.0)
tables (3.4.2)
tqdm (4.11.2)
tribeflow (0.0.0)
wheel (0.29.0)

Hope you can reproduce the procedure and meet the same problem successfully. Many thanks for your time!

flaviovdf commented 7 years ago

The problem is with the trace.dat you are generating. When I created the example, I did not pay attention that the file would only have 1 user. Please try it with one of the files in the example folder.

haoransh commented 7 years ago

I have tried several example dat files but encounter similar problems: Here is one excerpt: mpiexec -np 20 python main.py example/lastfm_our.dat 100 lastfm_ocelma.output.h5 --kernel eccdf --residency_priors 1 99 --dynamic True --leaveout 0.3 --num_iter 2000 --num_batches 20 The printout information is as follows:

Worker 14 has finished it's iterations!
Worker 13 has finished it's iterations!
Split
Merge
/home/shr/RS/tribeflow-v-paper-www/tribeflow/dynamic.py:164: RuntimeWarning: invalid value encountered in true_divide
  Theta_hz = Theta_hz / Theta_hz.sum(axis=0)
Traceback (most recent call last):
  File "main.py", line 135, in <module>
    main()
  File "main.py", line 123, in main
    args.num_batches, True, from_=from_, to=to)
  File "/home/shr/RS/tribeflow-v-paper-www/tribeflow/dynamic.py", line 400, in fit
    kernel.update_state(P)
  File "tribeflow/kernels/eccdf.pyx", line 71, in tribeflow.kernels.eccdf.ECCDFKernel.update_state (tribeflow/kernels/eccdf.c:2733)
    assert P.shape[0] == self.P.shape[0]
AssertionError
flaviovdf commented 7 years ago

Just did

mpiexec -np 3 python main.py ~/example/lastfm_our.dat 10 output.h5     --kernel eccdf --residency_priors 1 99 --dynamic True --leaveout 0.3 --num_iter 2000 --num_batches 20

On both the master and www branches. Not sure what may be your issue.

flaviovdf commented 7 years ago

With 20 workers as in your example

Worker 14 is working! Worker 15 is working! Worker 17 is working! Worker 18 is working! Worker 19 is working! Worker 1 is working! Worker 2 is working! Worker 3 is working! Worker 5 is working! Worker 6 is working! Worker 8 is working! Worker 10 is working! Worker 11 is working! Worker 12 is working! Worker 13 is working! Worker 4 is working! Worker 7 is working! Worker 9 is working! Worker 16 is working! Worker 17 has finished it's iterations! Worker 18 has finished it's iterations! Worker 11 has finished it's iterations! Worker 10 has finished it's iterations! Worker 8 has finished it's iterations! Worker 12 has finished it's iterations! Worker 19 has finished it's iterations! Worker 9 has finished it's iterations! Worker 5 has finished it's iterations! Worker 6 has finished it's iterations! Worker 13 has finished it's iterations! Worker 7 has finished it's iterations! Worker 1 has finished it's iterations! Worker 15 has finished it's iterations! Worker 16 has finished it's iterations! Worker 3 has finished it's iterations! Worker 14 has finished it's iterations! Worker 2 has finished it's iterations! Worker 4 has finished it's iterations! Split Merge Computing probs New nz 10 Learning took 13.0 seconds

haoransh commented 7 years ago

The difference lies in the parameter num_topics. In your command, it is set to 10, but in the readme file and my command, it's 100. Could you please set the num_topics to 10 on WWW branch?

It's very strange that I reset all the environment and now it can work on master branch but still meet Assertion Error on WWW branch. You can have a try. Now I can run the whole pipeline on master branch. Thank you all the same.

flaviovdf commented 7 years ago

The problem is likely due to environments becoming empty in the sampling. I do not guard against this. 100 topics for a very small trace will end up leaving a lot of envs empty. I'll check the code so that the exception does not happen (sampling finishes)

haoransh commented 7 years ago

Many thanks for your time!