Open marianafdz465 opened 3 years ago
Hi Mariana! Thanks for reporting this issue. I tried to reproduce the error using your code and some other data, but the error doesn't occur. Can you please share your data (by email if you like)? Can you also tell me the version of the library, your python version and your operating system?
Thank you,
Silvia
Same problem. How did you solve it?
Hi A11en0, can you please share your code, version of the library, your python version, and your operating system?
I'd be happy to help to solve the issue
Hello, I have the same problem. I am using colab. and received this error: "ValueError: num_samples should be a positive integer value, but got num_samples=0" OCTIS version: Version: 1.10.3
My code is as below: (data_sample is a pandas dataframe, with a text column that is a series of articles in Arabic not English)
data_sample['partition'] = 'train'
data_sample['partition'][0:100] = 'validation'
data_sample['partition'][100:200] = 'test'
columns_titles = ['text' ,'partition', 'targe']
data_sample=data_sample.reindex(columns=columns_titles)
data_sample.to_csv('/content/drive/MyDrive/Dataset/OCTIS/corpus.tsv', sep='\t', index=False, header=False)
doc = ['']
for text in data_sample['text']:
doc = doc + [text]
doc = ' '.join(doc)
doc = list(set(doc.split()))
with open('/content/drive/MyDrive/Dataset/OCTIS/vocabulary.txt', 'w') as output_file:
for token in doc:
output_file.write(token + '\n')
from octis.dataset.dataset import Dataset
dataset = Dataset()
dataset.load_custom_dataset_from_folder("/content/drive/MyDrive/Dataset/OCTIS/")
from octis.models.CTM import CTM
model = CTM(num_topics=10)
model_output = model.train_model(dataset) # Train the model
Thanks for your help.
Hello @alyrazik, could you send me the dataset (if possible) by email? I would really like to replicate this error but it has never happened with my data. So I wonder if it's something related to the data. Can you check if some documents are empty? Can you also share the full error stack?
Thanks a lot,
Silvia
Hello @silviatti ,
Thank you. The full error is below. I sent you the dataset and link to my Colab code via email. Thanks.
ValueError Traceback (most recent call last)
[<ipython-input-37-f0307d819d49>](https://localhost:8080/#) in <module>()
5 # bert_model="distiluse-base-multilingual-cased")
6 model = CTM(num_topics=10)
----> 7 model_output = model.train_model(dataset) # Train the model
8 cv = Coherence(texts=dataset.get_corpus(),topk=10, measure='c_npmi')
9 topic_diversity = TopicDiversity(topk=10)
3 frames
[/usr/local/lib/python3.7/dist-packages/octis/models/CTM.py](https://localhost:8080/#) in train_model(self, dataset, hyperparameters, top_words)
113 reduce_on_plateau=self.hyperparameters['reduce_on_plateau'],
114 topic_prior_variance=self.hyperparameters["prior_variance"])
--> 115 self.model.fit(x_train, x_valid, verbose=False)
116 result = self.inference(x_test)
117 return result
[/usr/local/lib/python3.7/dist-packages/octis/models/contextualized_topic_models/models/ctm.py](https://localhost:8080/#) in fit(self, train_dataset, validation_dataset, save_dir, verbose)
277 validation_loader = DataLoader(
278 self.validation_data, batch_size=self.batch_size, shuffle=True,
--> 279 num_workers=self.num_data_loader_workers)
280 # train epoch
281 s = datetime.datetime.now()
[/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py](https://localhost:8080/#) in __init__(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn, multiprocessing_context, generator, prefetch_factor, persistent_workers)
266 else: # map-style
267 if shuffle:
--> 268 sampler = RandomSampler(dataset, generator=generator)
269 else:
270 sampler = SequentialSampler(dataset)
[/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py](https://localhost:8080/#) in __init__(self, data_source, replacement, num_samples, generator)
101 if not isinstance(self.num_samples, int) or self.num_samples <= 0:
102 raise ValueError("num_samples should be a positive integer "
--> 103 "value, but got num_samples={}".format(self.num_samples))
104
105 @property
ValueError: num_samples should be a positive integer value, but got num_samples=0
Hello @silviatti Some findings:
Hi I faced the same problem. How can i solve it?
@DaryaZareM could you provide more information? Thanks,
Silvia
Description
I am not sure why when I try to run the optimize function I get this error "num_samples should be a positive integer value, but got num_samples=0"
What I Did
I can't find where to write this variable "num_samples"