Closed Ashh-Z closed 7 months ago
I'm not entirely sure where it might come from, but you can try changing save_top_k=1
to save_top=0
or =-1
, in order to save all/no checkpoint without consulting the metric (that isn't available in your case), maybe save_last=False
is what you're searching for.
If that doesn't work, you can change the monitored metric (monitor
and mode
) to something that's logged at training time like train/loss
, that might help ?
I'm not entirely sure where it might come from, but you can try changing
save_top_k=1
tosave_top=0
or=-1
, in order to save all/no checkpoint without consulting the metric (that isn't available in your case), maybesave_last=False
is what you're searching for. If that doesn't work, you can change the monitored metric (monitor
andmode
) to something that's logged at training time liketrain/loss
, that might help ?
Hi, I got around this error by setting the parameter num_workers = 0 for segmentation task.
segmentation_model.task = Segmentation(protocol,num_workers=0, duration=5.0, max_speakers_per_chunk=5, max_speakers_per_frame=3)
But now I am facing a different error :
ValueError: requested chunk [642406.912774s, 642411.912774s] (frames #10278510604 to #10278590604) lies outside of M028 file bounds [0., 1536.575000s] (24585200 frames).
On different runs, I get this error on different files.
I'm not entirely sure where it might come from, but you can try changing
save_top_k=1
tosave_top=0
or=-1
, in order to save all/no checkpoint without consulting the metric (that isn't available in your case), maybesave_last=False
is what you're searching for. If that doesn't work, you can change the monitored metric (monitor
andmode
) to something that's logged at training time liketrain/loss
, that might help ?Hi, I got around this error by setting the parameter num_workers = 0 for segmentation task.
segmentation_model.task = Segmentation(protocol,num_workers=0, duration=5.0, max_speakers_per_chunk=5, max_speakers_per_frame=3)
But now I am facing a different error :
ValueError: requested chunk [642406.912774s, 642411.912774s] (frames #10278510604 to #10278590604) lies outside of M028 file bounds [0., 1536.575000s] (24585200 frames).
On different runs, I get this error on different files.
Is this due to problems with the segmentation model ? The requested chunks on all these erroneous files seem to be way higher than the actual length of the audio file. The actual audio files are mostly around 30min long.
A similar issue was also reported here. But, in this it seems that the chunks requested are near the end of the audio file, that is not the case with my audio files.
Got this resolved, it was a issue with how I generated the UEM files. Thank You
Reference for code : https://colab.research.google.com/drive/1S7ayat76N-xluD4gvN958O7QCpW8-u0l?usp=sharing
Reference for setting database.yml : https://github.com/pyannote/AMI-diarization-setup/tree/main
I am trying the fine-tune the pyannote speaker diarization model with powerset loss on my custom dataset. But I don't wish to use a validation set and have tried to disable the early stopping. I have done this by setting the parameter num_sanity_val_steps = 0 for pytorch_lightning trainer .
My database.yml file (finetune.yml) :
Directory setup :
Code block :
Error :
PicklingError Traceback (most recent call last) File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\trainer\call.py:44, in _call_and_handle_interrupt(trainer, trainer_fn, *args, kwargs) 43 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, *kwargs) ---> 44 return trainer_fn(args, kwargs) 46 except _TunerExitException:
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\trainer\trainer.py:579, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) 573 ckpt_path = self._checkpoint_connector._select_ckpt_path( 574 self.state.fn, 575 ckpt_path, 576 model_provided=True, 577 model_connected=self.lightning_module is not None, 578 ) --> 579 self._run(model, ckpt_path=ckpt_path) 581 assert self.state.stopped
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\trainer\trainer.py:986, in Trainer._run(self, model, ckpt_path) 983 # ---------------------------- 984 # RUN THE TRAINER 985 # ---------------------------- --> 986 results = self._run_stage() 988 # ---------------------------- 989 # POST-Training CLEAN UP 990 # ----------------------------
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\trainer\trainer.py:1032, in Trainer._run_stage(self) 1031 with torch.autograd.set_detect_anomaly(self._detect_anomaly): -> 1032 self.fit_loop.run() 1033 return None
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\loops\fit_loop.py:197, in _FitLoop.run(self) 196 def run(self) -> None: --> 197 self.setup_data() 198 if self.skip:
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\loops\fit_loop.py:263, in _FitLoop.setup_data(self) 262 self._data_fetcher.setup(combined_loader) --> 263 iter(self._data_fetcher) # creates the iterator inside the fetcher 264 max_batches = sized_len(combined_loader)
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\loops\fetchers.py:104, in _PrefetchDataFetcher.iter(self) 102 @override 103 def iter(self) -> "_PrefetchDataFetcher": --> 104 super().iter() 105 if self.length is not None: 106 # ignore pre-fetching, it's not necessary
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\loops\fetchers.py:51, in _DataFetcher.iter(self) 49 @override 50 def iter(self) -> "_DataFetcher": ---> 51 self.iterator = iter(self.combined_loader) 52 self.reset()
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\utilities\combined_loader.py:351, in CombinedLoader.iter(self) 350 iterator = cls(self.flattened, self._limits) --> 351 iter(iterator) 352 self._iterator = iterator
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\utilities\combined_loader.py:92, in _MaxSizeCycle.iter(self) 90 @override 91 def iter(self) -> Self: ---> 92 super().iter() 93 self._consumed = [False] * len(self.iterables)
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\utilities\combined_loader.py:43, in _ModeIterator.iter(self) 41 @override 42 def iter(self) -> Self: ---> 43 self.iterators = [iter(iterable) for iterable in self.iterables] 44 self._idx = 0
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\utilities\combined_loader.py:43, in(.0)
41 @override
42 def iter(self) -> Self:
---> 43 self.iterators = [iter(iterable) for iterable in self.iterables]
44 self._idx = 0
File e:\ANACONDA\envs\deeplearning\Lib\site-packages\torch\utils\data\dataloader.py:438, in DataLoader.iter(self) 437 else: --> 438 return self._get_iterator()
File e:\ANACONDA\envs\deeplearning\Lib\site-packages\torch\utils\data\dataloader.py:386, in DataLoader._get_iterator(self) 385 self.check_worker_number_rationality() --> 386 return _MultiProcessingDataLoaderIter(self)
File e:\ANACONDA\envs\deeplearning\Lib\site-packages\torch\utils\data\dataloader.py:1039, in _MultiProcessingDataLoaderIter.init(self, loader) 1033 # NB: Process.start() actually take some time as it needs to 1034 # start a process and pass the arguments over via a pipe. 1035 # Therefore, we only add a worker to self._workers list after 1036 # it started, so that we do not call .join() if program dies 1037 # before it starts, and del tries to join but will get: 1038 # AssertionError: can only join a started process. -> 1039 w.start() 1040 self._index_queues.append(index_queue)
File e:\ANACONDA\envs\deeplearning\Lib\multiprocessing\process.py:121, in BaseProcess.start(self) 120 _cleanup() --> 121 self._popen = self._Popen(self) 122 self._sentinel = self._popen.sentinel
File e:\ANACONDA\envs\deeplearning\Lib\multiprocessing\context.py:224, in Process._Popen(process_obj) 222 @staticmethod 223 def _Popen(process_obj): --> 224 return _default_context.get_context().Process._Popen(process_obj)
File e:\ANACONDA\envs\deeplearning\Lib\multiprocessing\context.py:336, in SpawnProcess._Popen(process_obj) 335 from .popen_spawn_win32 import Popen --> 336 return Popen(process_obj)
File e:\ANACONDA\envs\deeplearning\Lib\multiprocessing\popen_spawn_win32.py:94, in Popen.init(self, process_obj) 93 reduction.dump(prep_data, to_child) ---> 94 reduction.dump(process_obj, to_child) 95 finally:
File e:\ANACONDA\envs\deeplearning\Lib\multiprocessing\reduction.py:60, in dump(obj, file, protocol) 59 '''Replacement for pickle.dump() using ForkingPickler.''' ---> 60 ForkingPickler(file, protocol).dump(obj)
PicklingError: Can't pickle <class 'pyannote.database.registry.dis'>: attribute lookup dis on pyannote.database.registry failed
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last) Cell In[59], line 55 44 # trainer = Trainer(accelerator="gpu", 45 # callbacks=callbacks, 46 # max_epochs=20, 47 # gradient_clip_val=0.5) 49 trainer = Trainer(accelerator="gpu", 50 callbacks=callbacks, 51 max_epochs=20, 52 gradient_clip_val=0.5, 53 num_sanity_val_steps=0) # Skip sanity check validation ---> 55 trainer.fit(segmentation_model)
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\trainer\trainer.py:543, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) 541 self.state.status = TrainerStatus.RUNNING 542 self.training = True --> 543 call._call_and_handle_interrupt( 544 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path 545 )
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\trainer\call.py:68, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs) 66 for logger in trainer.loggers: 67 logger.finalize("failed") ---> 68 trainer._teardown() 69 # teardown might access the stage so we reset it after 70 trainer.state.stage = None
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\trainer\trainer.py:1013, in Trainer._teardown(self) 1011 # loop should never be
None
here but it can because we don't know the trainer stage withddp_spawn
1012 if loop is not None: -> 1013 loop.teardown() 1014 self._logger_connector.teardown() 1015 self._signal_connector.teardown()File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\loops\fit_loop.py:411, in _FitLoop.teardown(self) 409 def teardown(self) -> None: 410 if self._data_fetcher is not None: --> 411 self._data_fetcher.teardown() 412 self._data_fetcher = None 413 self.epoch_loop.teardown()
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\loops\fetchers.py:79, in _DataFetcher.teardown(self) 78 def teardown(self) -> None: ---> 79 self.reset() 80 if self._combined_loader is not None: 81 self._combined_loader.reset()
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\loops\fetchers.py:141, in _PrefetchDataFetcher.reset(self) 139 @override 140 def reset(self) -> None: --> 141 super().reset() 142 self.batches = []
File ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\loops\fetchers.py:75, in _DataFetcher.reset(self) 73 # teardown calls
reset()
, and if it happens early,combined_loader
can still be None 74 if self._combined_loader is not None: ---> 75 self.length = sized_len(self.combined_loader) 76 self.done = self.length == 0File ~\AppData\Roaming\Python\Python311\site-packages\lightning_fabric\utilities\data.py:51, in sized_len(dataloader) 48 """Try to get the length of an object, return
None
otherwise.""" 49 try: 50 # try getting the length ---> 51 length = len(dataloader) # type: ignore [arg-type] 52 except (TypeError, NotImplementedError): 53 length = NoneFile ~\AppData\Roaming\Python\Python311\site-packages\pytorch_lightning\utilities\combined_loader.py:358, in CombinedLoader.len(self) 356 """Compute the number of batches.""" 357 if self._iterator is None: --> 358 raise RuntimeError("Please call
iter(combined_loader)
first.") 359 return len(self._iterator)RuntimeError: Please call
iter(combined_loader)
first.I cannot understand what is causing this error. Any help would be appreciated. Thank You