@MichaelSolotky @MelLain
Hello. I'm getting this error when trying to access TopicKernelScore stats. I think that problem related with the size of the document corpus. I have 13k documents in my collection and this bug occurs when document corpus is greater than 1500 documents (e.g lines in vw.txt)
project with 1500 documents:
Project: bugged_model.zip
All other metrics is working fine with any size of documents corpus
code:
import artm
batches_folder = 'batches/'
data_path='vw.txt'
batch_vectorizer = artm.BatchVectorizer(data_path=data_path,
data_format='vowpal_wabbit',
target_folder=batches_folder)
dictionary = artm.Dictionary()
dictionary.gather(data_path=batches_folder)
topic_names = ["Topic_"+str(i) for i in range(30)]
model = artm.ARTM(topic_names=topic_names,
num_topics=30,
dictionary=dictionary)
model.scores.add(artm.PerplexityScore(name='PerplexityScore', dictionary=dictionary))
model.scores.add(artm.TopicKernelScore(name='TopicKernelScore',
probability_mass_threshold=0.07))
model.fit_offline(batch_vectorizer=batch_vectorizer,
num_collection_passes=40)
print(model.score_tracker['PerplexityScore'].value)
print(model.score_tracker['TopicKernelScore'].average_contrast)
print(model.score_tracker['TopicKernelScore'].average_purity)
stack trace:
---------------------------------------------------------------------------
DecodeError Traceback (most recent call last)
<ipython-input-2-2cc99a1e0396> in <module>()
21 num_collection_passes=40)
22 print(model.score_tracker['PerplexityScore'].value)
---> 23 print(model.score_tracker['TopicKernelScore'].average_contrast)
24 print(model.score_tracker['TopicKernelScore'].average_purity)
~/anaconda3/lib/python3.6/site-packages/artm/score_tracker.py in <lambda>(self, p)
86 setattr(class_ref,
87 name,
---> 88 property(lambda self, p=_p: _get_score(self._name, self._master, p)))
89 setattr(class_ref,
90 'last_{}'.format(name),
~/anaconda3/lib/python3.6/site-packages/artm/score_tracker.py in _get_score(score_name, master, field_attrs, last)
41 return result_dict
42
---> 43 data_array = master.get_score_array(score_name)
44
45 if field_attrs[1] == 'optional' and field_attrs[2] == 'scalar':
~/anaconda3/lib/python3.6/site-packages/artm/master_component.py in get_score_array(self, score_name)
715 """
716 args = messages.GetScoreArrayArgs(score_name=score_name)
--> 717 score_array = self._lib.ArtmRequestScoreArray(self.master_id, args)
718
719 scores = []
~/anaconda3/lib/python3.6/site-packages/artm/wrapper/api.py in artm_api_call(*args)
163 # return result value
164 if spec.request_type is not None:
--> 165 return self._get_requested_message(length=result, func=spec.request_type)
166 if spec.result_type is not None:
167 return result
~/anaconda3/lib/python3.6/site-packages/artm/wrapper/api.py in _get_requested_message(self, length, func)
104 self._check_error(error_code)
105 message = func()
--> 106 message.ParseFromString(message_blob.raw)
107 return message
108
DecodeError: Error parsing message
@MichaelSolotky @MelLain Hello. I'm getting this error when trying to access TopicKernelScore stats. I think that problem related with the size of the document corpus. I have 13k documents in my collection and this bug occurs when document corpus is greater than 1500 documents (e.g lines in vw.txt)
project with 1500 documents: Project: bugged_model.zip All other metrics is working fine with any size of documents corpus
code:
stack trace:
versions:
bugged_model.zip