MilaNLProc / contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
MIT License
1.21k stars 146 forks source link

OSError: [Errno 22] Invalid argument #127

Closed bilgehanozkan closed 1 year ago

bilgehanozkan commented 1 year ago

Hello, I'm trying to run CombinedTM with different datasets. I get this error OS Error especially when dataset is larger and I can't figure if it is related with memory usage. Here is the traceback:

OSError Traceback (most recent call last)

in 1 ctm = CombinedTM(bow_size=len(tp.vocab), contextual_size=768, n_components=10, num_epochs=10) ----> 2 ctm.fit(training_dataset) # run the model E:\Users\blghn\anaconda3\lib\site-packages\contextualized_topic_models\models\ctm.py in fit(self, train_dataset, validation_dataset, save_dir, verbose, patience, delta, n_samples) 272 # train epoch 273 s = datetime.datetime.now() --> 274 sp, train_loss = self._train_epoch(train_loader) 275 samples_processed += sp 276 e = datetime.datetime.now() E:\Users\blghn\anaconda3\lib\site-packages\contextualized_topic_models\models\ctm.py in _train_epoch(self, loader) 171 samples_processed = 0 172 --> 173 for batch_samples in loader: 174 # batch_size x vocab_size 175 X_bow = batch_samples['X_bow'] E:\Users\blghn\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self) 433 return self._iterator 434 else: --> 435 return self._get_iterator() 436 437 @property E:\Users\blghn\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _get_iterator(self) 379 else: 380 self.check_worker_number_rationality() --> 381 return _MultiProcessingDataLoaderIter(self) 382 383 @property E:\Users\blghn\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader) 1032 # before it starts, and __del__ tries to join but will get: 1033 # AssertionError: can only join a started process. -> 1034 w.start() 1035 self._index_queues.append(index_queue) 1036 self._workers.append(w) E:\Users\blghn\anaconda3\lib\multiprocessing\process.py in start(self) 119 'daemonic processes are not allowed to have children' 120 _cleanup() --> 121 self._popen = self._Popen(self) 122 self._sentinel = self._popen.sentinel 123 # Avoid a refcycle if the target function holds an indirect E:\Users\blghn\anaconda3\lib\multiprocessing\context.py in _Popen(process_obj) 222 @staticmethod 223 def _Popen(process_obj): --> 224 return _default_context.get_context().Process._Popen(process_obj) 225 226 class DefaultContext(BaseContext): E:\Users\blghn\anaconda3\lib\multiprocessing\context.py in _Popen(process_obj) 325 def _Popen(process_obj): 326 from .popen_spawn_win32 import Popen --> 327 return Popen(process_obj) 328 329 class SpawnContext(BaseContext): E:\Users\blghn\anaconda3\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj) 91 try: 92 reduction.dump(prep_data, to_child) ---> 93 reduction.dump(process_obj, to_child) 94 finally: 95 set_spawning_popen(None) E:\Users\blghn\anaconda3\lib\multiprocessing\reduction.py in dump(obj, file, protocol) 58 def dump(obj, file, protocol=None): 59 '''Replacement for pickle.dump() using ForkingPickler.''' ---> 60 ForkingPickler(file, protocol).dump(obj) 61 62 # OSError: [Errno 22] Invalid argument Thank you.
vinid commented 1 year ago

Hello!

Try to see if the issue is the same on reported here.

Setting the workers to 0 should solve the problems on Windows

bilgehanozkan commented 1 year ago

Thanks for the reply. When I set the workers to 0, I keep getting this error over and over again and jupyter crashes.

error

vinid commented 1 year ago

Ok good, that's a warning.

A few follow-up questions:

can you see if adding this snippet before training helps?

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 

This is not good practice, but I just want to check if the notebook is crashing because of too many warnings or for something else.

bilgehanozkan commented 1 year ago

The dataset is very large and I'm running it on GPU. It works fine when warnings are ignored, thank you! :)

I'm closing the issue if it's ok.

vinid commented 1 year ago

no let's keep it open! thanks :)