Open DongqingSun96 opened 3 years ago
@DongqingSun96,
Oops, sorry for late answer. I forgot this issue totally.
Currently, tomotopy
doesn't provide such function, because its internal implementation cannot accept a matrix in bag-of-words format. To insert sparse matrix into tomotopy
s corpus
at current version, you should restore the word list from the matrix, and call add_doc()
repeatedly.
If utility functions like Dense2Corpus or Sparse2Corpus are needed often, I can improve tomotopy.corpus
to accept matrix input by modifying its internal implementation. But it takes some times.
Adding new features into tomotopy.utils.Corpus
constructing from matrix:
Corpus.from_dense_matrix(matrix, vocab_dict) -> Corpus
Corpus.from_sparse_matrix(matrix, vocab_dict) -> Corpus
new features constructing matrix from Corpus:
Corpus.to_dense_matrix(self) -> numpy.ndarray
Corpus.to_sparse_matrix(self) -> scipy.sparse.csc_matrix
Exposing vocab dict property of Corpus
Corpus.vocab_dict -> tomotopy.utils.VocabDict
Hi,
As I know,
genism
provides a functionSparse2Corpus
to convert sparse matrix to Gensimcorpus
format. Is there a similar function intomotopy
which can convert a document-by-term matrix tocorpus
class intomotopy
?Thanks.