bigartm / bigartm

Fast topic modeling platform
http://bigartm.org/
Other
661 stars 117 forks source link

'Error parsing message' when accessing TopicKernelScore score tracker values. #972

Open Chemoday opened 5 years ago

Chemoday commented 5 years ago

@MichaelSolotky @MelLain Hello. I'm getting this error when trying to access TopicKernelScore stats. I think that problem related with the size of the document corpus. I have 13k documents in my collection and this bug occurs when document corpus is greater than 1500 documents (e.g lines in vw.txt)

project with 1500 documents: Project: bugged_model.zip All other metrics is working fine with any size of documents corpus

code:

import artm
batches_folder = 'batches/'
data_path='vw.txt'

batch_vectorizer = artm.BatchVectorizer(data_path=data_path,
                                        data_format='vowpal_wabbit',
                                        target_folder=batches_folder)

dictionary = artm.Dictionary()
dictionary.gather(data_path=batches_folder)
topic_names = ["Topic_"+str(i) for i in range(30)]
model = artm.ARTM(topic_names=topic_names,
                  num_topics=30,
                  dictionary=dictionary)
model.scores.add(artm.PerplexityScore(name='PerplexityScore', dictionary=dictionary))
model.scores.add(artm.TopicKernelScore(name='TopicKernelScore',
                                       probability_mass_threshold=0.07))

model.fit_offline(batch_vectorizer=batch_vectorizer,
                              num_collection_passes=40)
print(model.score_tracker['PerplexityScore'].value)
print(model.score_tracker['TopicKernelScore'].average_contrast)
print(model.score_tracker['TopicKernelScore'].average_purity)

stack trace:

---------------------------------------------------------------------------
DecodeError                               Traceback (most recent call last)
<ipython-input-2-2cc99a1e0396> in <module>()
     21                               num_collection_passes=40)
     22 print(model.score_tracker['PerplexityScore'].value)
---> 23 print(model.score_tracker['TopicKernelScore'].average_contrast)
     24 print(model.score_tracker['TopicKernelScore'].average_purity)

~/anaconda3/lib/python3.6/site-packages/artm/score_tracker.py in <lambda>(self, p)
     86         setattr(class_ref,
     87                 name,
---> 88                 property(lambda self, p=_p: _get_score(self._name, self._master, p)))
     89         setattr(class_ref,
     90                 'last_{}'.format(name),

~/anaconda3/lib/python3.6/site-packages/artm/score_tracker.py in _get_score(score_name, master, field_attrs, last)
     41         return result_dict
     42 
---> 43     data_array = master.get_score_array(score_name)
     44 
     45     if field_attrs[1] == 'optional' and field_attrs[2] == 'scalar':

~/anaconda3/lib/python3.6/site-packages/artm/master_component.py in get_score_array(self, score_name)
    715         """
    716         args = messages.GetScoreArrayArgs(score_name=score_name)
--> 717         score_array = self._lib.ArtmRequestScoreArray(self.master_id, args)
    718 
    719         scores = []

~/anaconda3/lib/python3.6/site-packages/artm/wrapper/api.py in artm_api_call(*args)
    163             # return result value
    164             if spec.request_type is not None:
--> 165                 return self._get_requested_message(length=result, func=spec.request_type)
    166             if spec.result_type is not None:
    167                 return result

~/anaconda3/lib/python3.6/site-packages/artm/wrapper/api.py in _get_requested_message(self, length, func)
    104         self._check_error(error_code)
    105         message = func()
--> 106         message.ParseFromString(message_blob.raw)
    107         return message
    108 

DecodeError: Error parsing message

versions:

absl-py==0.6.1
alabaster==0.7.10
anaconda-client==1.6.14
anaconda-navigator==1.8.7
anaconda-project==0.8.2
appdirs==1.4.0
asn1crypto==0.24.0
astor==0.7.1
astroid==1.6.3
astropy==3.0.2
attrs==18.1.0
Babel==2.5.3
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
beautifulsoup4==4.6.0
bigartm==0.10.0
bitarray==0.8.1
bkcharts==0.2
blaze==0.11.3
bleach==2.1.3
bokeh==0.12.16
boto==2.48.0
boto3==1.9.67
botocore==1.12.67
Bottleneck==1.2.1
bz2file==0.98
cached-property==1.3.0
certifi==2019.3.9
cffi==1.11.5
chardet==3.0.4
click==6.7
cloudpickle==0.5.3
clyent==1.2.2
colorama==0.3.9
conda==4.6.14
conda-build==3.10.5
conda-verify==2.0.0
contextlib2==0.5.5
cryptography==2.7
cycler==0.10.0
Cython==0.28.2
cytoolz==0.9.0.1
dask==0.17.5
datashape==0.5.4
DAWG-Python==0.7.2
decorator==4.3.0
distributed==1.21.8
Django==1.10.5
django-test-without-migrations==0.6
docopt==0.6.2
docutils==0.14
dominate==2.3.1
entrypoints==0.2.3
et-xmlfile==1.0.1
fastcache==1.0.2
filelock==3.0.4
Flask==0.12.2
Flask-Bootstrap==3.3.7.1
Flask-Cors==3.0.4
Flask-Moment==0.5.1
flask-peewee==0.6.7
Flask-Script==2.0.5
Flask-WTF==0.14.2
future==0.17.1
gast==0.2.0
gensim==3.6.0
gevent==1.3.0
glob2==0.6
gmpy2==2.0.8
greenlet==0.4.13
grpcio==1.16.0
gym==0.10.9
h5py==2.7.1
heapdict==1.0.0
html5lib==1.0.1
hyperopt==0.1.2
idna==2.6
imageio==2.3.0
imagesize==1.0.0
ipykernel==4.8.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.2.1
isort==4.3.4
itsdangerous==0.24
jdcal==1.4
jedi==0.10.2
Jinja2==2.9.6
jmespath==0.9.3
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
jupyterlab==0.32.1
jupyterlab-launcher==0.10.5
jupyterthemes==0.20.0
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
kiwisolver==1.0.1
lazy-object-proxy==1.3.1
lesscpy==0.13.0
llvmlite==0.23.1
locket==0.2.0
lxml==4.2.1
Markdown==3.0.1
MarkupSafe==1.0
matplotlib==3.1.0
mccabe==0.6.1
mistune==0.8.3
mkl-fft==1.0.0
mkl-random==1.0.1
mock==2.0.0
more-itertools==4.1.0
mpmath==1.0.0
msgpack==0.5.6
msgpack-python==0.5.6
multipledispatch==0.5.0
navigator-updater==0.2.1
nbconvert==5.3.1
nbformat==4.4.0
networkx==2.1
nltk==3.4
nose==1.3.7
notebook==5.7.8
numba==0.38.0
numexpr==2.6.5
numpy==1.16.0
numpydoc==0.8.0
odo==0.5.1
olefile==0.45.1
openpyxl==2.5.3
packaging==16.8
pandas==0.19.2
pandocfilters==1.4.2
parso==0.2.0
partd==0.3.8
path.py==11.0.1
pathlib2==2.3.2
patsy==0.5.0
pbr==3.1.1
peewee==2.10.1
pep8==1.7.1
pexpect==4.5.0
pickleshare==0.7.4
Pillow==5.4.1
pkginfo==1.4.2
pluggy==0.6.0
ply==3.11
prometheus-client==0.6.0
prompt-toolkit==1.0.15
protobuf==3.0.0
psutil==5.4.5
ptyprocess==0.5.2
PuLP==1.6.5
py==1.5.3
pycodestyle==2.4.0
pycosat==0.6.3
pycparser==2.18
pycrypto==2.6.1
pycurl==7.43.0.2
pyflakes==1.6.0
pyglet==1.3.2
Pygments==2.2.0
pylint==1.8.4
pymongo==3.8.0
pymorphy2==0.8
pymorphy2-dicts==2.4.393442.3710985
pymystem3==0.2.0
pyodbc==4.0.23
pyOpenSSL==18.0.0
pyparsing==2.1.10
PySocks==1.6.8
pytest==3.5.1
pytest-arraydiff==0.2
pytest-astropy==0.3.0
pytest-doctestplus==0.1.3
pytest-openfiles==0.3.0
pytest-remotedata==0.2.1
python-dateutil==2.6.0
pytz==2016.10
PyWavelets==0.5.2
PyYAML==3.12
pyzmq==17.0.0
QtAwesome==0.4.4
qtconsole==4.3.1
QtPy==1.4.1
requests==2.18.4
rope==0.10.7
ruamel-yaml==0.15.35
s3transfer==0.1.13
scikit-image==0.13.1
scikit-learn==0.19.1
scipy==1.1.0
seaborn==0.9.0
Send2Trash==1.5.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
smart-open==1.7.1
snowballstemmer==1.2.1
sortedcollections==0.6.1
sortedcontainers==1.5.10
Sphinx==1.7.4
sphinxcontrib-websupport==1.0.1
spyder==3.2.8
SQLAlchemy==1.2.7
statsmodels==0.9.0
sympy==1.1.1
tables==3.4.3
tblib==1.3.2
tensorboard==1.12.0
tensorflow==1.12.0
termcolor==1.1.0
terminado==0.8.1
testpath==0.3.1
tflearn==0.3.2
toolz==0.9.0
tornado==5.0.2
tqdm==4.30.0
traitlets==4.3.2
typing==3.6.4
ufal.udpipe==1.2.0.1
unicodecsv==0.14.1
urllib3==1.22
visitor==0.1.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.12.2
wget==3.2
widgetsnbextension==3.2.1
wrapt==1.10.11
wtf-peewee==0.2.6
WTForms==2.1
xlrd==1.1.0
XlsxWriter==1.0.4
xlwt==1.3.0
zict==0.1.3

bugged_model.zip

MichaelSolotky commented 5 years ago

Trying to find the cause. I assume, there is something with anaconda's python interpreter, cos everything works fine on my laptop.