I am trying to use the code from the tutorial on topic modeling and I am facing a problem I cannot solve on my own:
Traceback (most recent call last):
File "/.../mallet_python.py", line 47, in
doctopic[row_num, topic] = share
IndexError: index 14 is out of bounds for axis 1 with size 6
I have copied the code, stored it in a .py file and adjusted the path to the doc-topic-file:
import os
import numpy as np
import itertools
import operator
def grouper(n, iterable, fillvalue=None):
#Collect data into fixed-length chunks or blocks
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
doctopic_triples = []
mallet_docnames = []
with open("doc-topics.txt") as f:
f.readline() # read one line in order to skip the header
for line in f:
docnum, docname, *values = line.rstrip().split('\t')
mallet_docnames.append(docname)
for topic, share in grouper(2, values):
triple = (docname, int(topic), float(share))
doctopic_triples.append(triple)
#sort the triples
#triple is (docname, topicnum, share) so sort(key=operator.itemgetter(0,1))
#sorts on (docname, topicnum) which is what we want
doctopic_triples = sorted(doctopic_triples, key=operator.itemgetter(0,1))
#sort the document names rather than relying on MALLET's ordering
mallet_docnames = sorted(mallet_docnames)
#collect into a document-term matrix
num_docs = len(mallet_docnames)
num_topics = len(doctopic_triples) // len(mallet_docnames)
#the following works because we know that the triples are in sequential order
doctopic = np.zeros((num_docs, num_topics))
for triple in doctopic_triples:
docname, topic, share = triple
row_num = mallet_docnames.index(docname)
doctopic[row_num, topic] = share # error
My doc-topic file has the following structure:
doc_number filename topic share ....
There are all together 6 columns with topics and 6 with shares, thus, this makes 12 in total ... + doc_number and filename makes 14... I guess that is what the error is about. But I don't know what I am doing wrong.
Hello there,
I am trying to use the code from the tutorial on topic modeling and I am facing a problem I cannot solve on my own:
Traceback (most recent call last): File "/.../mallet_python.py", line 47, in
doctopic[row_num, topic] = share
IndexError: index 14 is out of bounds for axis 1 with size 6
I have copied the code, stored it in a .py file and adjusted the path to the doc-topic-file:
My doc-topic file has the following structure:
doc_number filename topic share ....
There are all together 6 columns with topics and 6 with shares, thus, this makes 12 in total ... + doc_number and filename makes 14... I guess that is what the error is about. But I don't know what I am doing wrong.
Thanks in advance!