Closed amirmohammadkz closed 2 years ago
Hello!
Yes I think this is an issue wrt to the batch norm that cannot be computed with one single sample I am not sure if there's an easy workaround for this but I am happy to work on a fix if you find something that can be used to bypass this issue!
Hello, I managed to get around this by defining the batch_size
and dropping anything that is out of the batch. The default batch_size
is set to 64 (see here) so you can pass in as a parameter to the model as well e.g. batch_size=batch_size
below
from contextualized_topic_models.utils.preprocessing import WhiteSpacePreprocessing
documents = [line.strip() for line in open("unpreprocessed_documents.txt").readlines()]
batch_size = 64
documents = documents[: len(documents) // batch_size * batch_size]
sp = WhiteSpacePreprocessing(documents, "english")
preprocessed_documents, unpreprocessed_documents, vocab = sp.preprocess()
....
ctm = CombinedTM(
bow_size=len(tp.vocab),
contextual_size=768,
n_components=100,
num_epochs=20,
batch_size=batch_size,
)
Thanks a lot @DerekChia!
Description
I was trying to train a CTM on my dataset. However, I got a value error. Tried with a similar dataset with a different number of samples, and it worked. I think the problem is that the sample size is not dividable by the batch size and the remainder is 1.
What I Did