Closed mitramir55 closed 1 year ago
Hello!
how is computation time computed here?
Hi, here is the code:
# Prepare CTM data
qt_ctm = TopicModelDataPreparation("all-mpnet-base-v2")
training_dataset_ctm = qt_ctm.fit(
text_for_contextual=docs, text_for_bow=preprocessed_documents
)
# training
params = {
"n_components": k_topics,
"contextual_size":768
}
params["bow_size"] = len(qt_ctm.vocab)
ctm = CombinedTM(**params)
start = time.time()
ctm.fit(training_dataset_ctm)
end = time.time()
computation_time = float(end - start)
I am not able to reproduce this on one of our datasets: https://colab.research.google.com/drive/1-lv5aUWpW4ToJoU2AoODrftmOccWhjFS?usp=sharing
My data consists of more than a million tweets, can number of records have anything to do with it?
It shouldn't, since we are still doing batches and that's what matters.
Unless we have something else that is slowing down the process.
How long does it take to go through the entire dataset (I see 1 in your plots but I am not sure if it's hours or days).
Description
I've been keeping track of the calculation time while I ran the Combined Topic modeling approach on my dataset (about 1 million Tweets). I observe that the computation becomes smaller as the number of topics increases. I'm curious about the justification. This is a preview of the models I've used so far:
I will be running the zero-shot approach for monolingual datasets as well, but for now this is what I'm seeing.