flaviovdf / tribeflow

TribeFlow's source code
http://flaviovdf.github.io/tribeflow
BSD 3-Clause "New" or "Revised" License
31 stars 7 forks source link

MemoryError in mean reciprocal rank computation for Brightkite data #5

Closed dionman closed 6 years ago

dionman commented 6 years ago

Running the script for reciprocal rank computation on the predictions of the model for the brightkite dataset, I am getting MemoryError upon the creation of predictions array

PYTHONPATH=. python scripts/mrr.py data/output_brightkite.h5 rss.dat data/predictions_brightkite.dat &

matrix shape :  (10000, 3) (10000, 1) (525267, 10)
Traceback (most recent call last):
  File "scripts/mrr.py", line 86, in <module>
    plac.call(main)
  File "/usr/local/lib/python2.7/dist-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python2.7/dist-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "scripts/mrr.py", line 77, in main
    HSDs, previous_stamps, Theta_zh, Psi_sz, count_z, kernel, True)
  File "tribeflow/_eval.pyx", line 97, in tribeflow._eval.reciprocal_rank (tribeflow/_eval.c:2525)
    np.zeros(shape=(HOs.shape[0], ns), dtype='d')
MemoryError

Is there a scipy.sparse based workaround for large datasets?

flaviovdf commented 6 years ago

No there is not. A lot of the code is in cython and there are no sparse matrices for that.

-- Flavio

On Thu, Sep 14, 2017 at 2:03 PM, dionman notifications@github.com wrote:

Running the script for reciprocal rank computation on the predictions of the model for the brightkite dataset, I am getting MemoryError upon the creation of predictions array

`

PYTHONPATH=. python scripts/mrr.py data/output_brightkite.h5 rss.dat data/predictions_brightkite.dat &

matrix shape : (10000, 3) (10000, 1) (525267, 10) Traceback (most recent call last): File "scripts/mrr.py", line 86, in plac.call(main) File "/usr/local/lib/python2.7/dist-packages/plac_core.py", line 328, in call cmd, result = parser.consume(arglist) File "/usr/local/lib/python2.7/dist-packages/plac_core.py", line 207, in consume return cmd, self.func(*(args + varargs + extraopts), **kwargs) File "scripts/mrr.py", line 77, in main HSDs, previous_stamps, Theta_zh, Psi_sz, count_z, kernel, True) File "tribeflow/_eval.pyx", line 97, in tribeflow._eval.reciprocal_rank (tribeflow/_eval.c:2525) np.zeros(shape=(HOs.shape[0], ns), dtype='d') MemoryError `

Is there a scipy.sparse based workaround for large datasets?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/flaviovdf/tribeflow/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/AAf08JMolrvXmB6xQ_GEo8KIJ_7O0KlWks5siVxogaJpZM4PX5PR .