SortAnon / ControllableTalkNet

A web app that lets you play around with TalkNet models
GNU Affero General Public License v3.0
121 stars 48 forks source link

TalkNet_Training_Offline error #25

Open rikabi89 opened 1 year ago

rikabi89 commented 1 year ago

GPU available: True, used: True TPU available: False, using: 0 TPU cores [NeMo W 2022-12-02 09:58:37 modelPT:138] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader. Train config : dataset: target: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset manifest_filepath: H:/ControllableTalkNet/tTrump\trainfiles.json max_duration: null min_duration: 0.1 int_values: false load_audio: false normalize: false sample_rate: 22050 trim: false durs_file: H:/ControllableTalkNet/tTrump\durations.pt f0_file: H:/ControllableTalkNet/tTrump\f0s.pt blanking: true vocab: notation: phonemes punct: true spaces: true stresses: false add_blank_at: last dataloader_params: drop_last: false shuffle: true batch_size: 16 num_workers: 4

[NeMo W 2022-12-02 09:58:37 modelPT:145] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). Validation config : dataset: target: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset manifest_filepath: H:/ControllableTalkNet/tTrump\valfiles.json max_duration: null min_duration: 0.1 int_values: false load_audio: false normalize: false sample_rate: 22050 trim: false durs_file: H:/ControllableTalkNet/tTrump\durations.pt f0_file: H:/ControllableTalkNet/tTrump\f0s.pt blanking: true vocab: notation: phonemes punct: true spaces: true stresses: false add_blank_at: last dataloader_params: drop_last: false shuffle: false batch_size: 16 num_workers: 1

[NeMo I 2022-12-02 09:58:37 modelPT:439] Model TalkNetDursModel was successfully restored from H:\ControllableTalkNet\talknet_durs.nemo. [NeMo I 2022-12-02 09:58:37 collections:173] Dataset loaded with 134 files totalling 0.21 hours [NeMo I 2022-12-02 09:58:37 collections:174] 0 files were filtered totalling 0.00 hours [NeMo I 2022-12-02 09:58:37 collections:173] Dataset loaded with 134 files totalling 0.21 hours [NeMo I 2022-12-02 09:58:37 collections:174] 0 files were filtered totalling 0.00 hours [NeMo W 2022-12-02 09:58:37 modelPT:660] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead. [NeMo I 2022-12-02 09:58:37 modelPT:751] Optimizer config = Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) eps: 1e-08 lr: 0.001 weight_decay: 1e-06 ) [NeMo I 2022-12-02 09:58:37 lr_scheduler:621] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x0000021A2DF86EB0>" will be used during training (effective maximum steps = 180) - Parameters : (min_lr: 3.0e-06 warmup_ratio: 0.02 max_steps: 180 ) Warm-starting from H:\ControllableTalkNet\talknet_durs.nemo [NeMo I 2022-12-02 09:58:37 exp_manager:216] Experiments will be logged at H:\ControllableTalkNet\tTrump\TalkNetDurs\2022-12-02_09-57-24 [NeMo I 2022-12-02 09:58:37 exp_manager:563] TensorboardLogger has been set up LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] [NeMo W 2022-12-02 09:58:38 modelPT:660] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead. [NeMo I 2022-12-02 09:58:38 modelPT:751] Optimizer config = Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) eps: 1e-08 lr: 0.001 weight_decay: 1e-06 ) [NeMo I 2022-12-02 09:58:38 lr_scheduler:621] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x0000021A2E22DCD0>" will be used during training (effective maximum steps = 180) - Parameters : (min_lr: 3.0e-06 warmup_ratio: 0.02 max_steps: 180 )

| Name | Type | Params

0 | embed | Embedding | 7.6 K 1 | model | ConvASREncoder | 2.5 M 2 | proj | Conv1d | 513

2.5 M Trainable params 0 Non-trainable params 2.5 M Total params 9.841 Total estimated model params size (MB) Validation sanity check: 0% 0/2 [00:00<?, ?it/s]

PicklingError Traceback (most recent call last) Cell In[6], line 68 66 initialize(config_path="conf") 67 cfg = compose(config_name="talknet-durs") ---> 68 train(cfg)

Cell In[6], line 62, in train(cfg) 60 exp_manager(trainer, cfg.get('exp_manager', None)) 61 trainer.callbacks.extend([pl.callbacks.LearningRateMonitor(), LogEpochTimeCallback()]) # noqa ---> 62 trainer.fit(model)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:460, in Trainer.fit(self, model, train_dataloader, val_dataloaders, datamodule) 455 # links data to the trainer 456 self.data_connector.attach_data( 457 model, train_dataloader=train_dataloader, val_dataloaders=val_dataloaders, datamodule=datamodule 458 ) --> 460 self._run(model) 462 assert self.state.stopped 463 self.training = False

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:758, in Trainer._run(self, model) 755 self.pre_dispatch() 757 # dispatch start_training or start_evaluating or start_predicting --> 758 self.dispatch() 760 # plugin will finalized fitting (e.g. ddp_spawn will load trained model) 761 self.post_dispatch()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:799, in Trainer.dispatch(self) 797 self.accelerator.start_predicting(self) 798 else: --> 799 self.accelerator.start_training(self)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\accelerators\accelerator.py:96, in Accelerator.start_training(self, trainer) 95 def start_training(self, trainer: 'pl.Trainer') -> None: ---> 96 self.training_type_plugin.start_training(trainer)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py:144, in TrainingTypePlugin.start_training(self, trainer) 142 def start_training(self, trainer: 'pl.Trainer') -> None: 143 # double dispatch to initiate the training loop --> 144 self._results = trainer.run_stage()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:809, in Trainer.run_stage(self) 807 if self.predicting: 808 return self.run_predict() --> 809 return self.run_train()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:844, in Trainer.run_train(self) 841 if not self.is_global_zero and self.progress_bar_callback is not None: 842 self.progress_bar_callback.disable() --> 844 self.run_sanity_check(self.lightning_module) 846 self.checkpoint_connector.has_trained = False 848 # enable train mode

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:1112, in Trainer.run_sanity_check(self, ref_model) 1109 self.on_sanity_check_start() 1111 # run eval step -> 1112 self.run_evaluation() 1114 self.on_sanity_check_end() 1116 self.state.stage = stage

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:954, in Trainer.run_evaluation(self, on_epoch) 951 dataloader = self.accelerator.process_dataloader(dataloader) 952 dl_max_batches = self.evaluation_loop.max_batches[dataloader_idx] --> 954 for batch_idx, batch in enumerate(dataloader): 955 if batch is None: 956 continue

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:355, in DataLoader.iter(self) 353 return self._iterator 354 else: --> 355 return self._get_iterator()

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:301, in DataLoader._get_iterator(self) 299 else: 300 self.check_worker_number_rationality() --> 301 return _MultiProcessingDataLoaderIter(self)

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:914, in _MultiProcessingDataLoaderIter.init(self, loader) 907 w.daemon = True 908 # NB: Process.start() actually take some time as it needs to 909 # start a process and pass the arguments over via a pipe. 910 # Therefore, we only add a worker to self._workers list after 911 # it started, so that we do not call .join() if program dies 912 # before it starts, and del tries to join but will get: 913 # AssertionError: can only join a started process. --> 914 w.start() 915 self._index_queues.append(index_queue) 916 self._workers.append(w)

File ~\anaconda3\envs\talknet\lib\multiprocessing\process.py:121, in BaseProcess.start(self) 118 assert not _current_process._config.get('daemon'), \ 119 'daemonic processes are not allowed to have children' 120 _cleanup() --> 121 self._popen = self._Popen(self) 122 self._sentinel = self._popen.sentinel 123 # Avoid a refcycle if the target function holds an indirect 124 # reference to the process object (see bpo-30775)

File ~\anaconda3\envs\talknet\lib\multiprocessing\context.py:224, in Process._Popen(process_obj) 222 @staticmethod 223 def _Popen(process_obj): --> 224 return _default_context.get_context().Process._Popen(process_obj)

File ~\anaconda3\envs\talknet\lib\multiprocessing\context.py:327, in SpawnProcess._Popen(process_obj) 324 @staticmethod 325 def _Popen(process_obj): 326 from .popen_spawn_win32 import Popen --> 327 return Popen(process_obj)

File ~\anaconda3\envs\talknet\lib\multiprocessing\popen_spawn_win32.py:93, in Popen.init(self, process_obj) 91 try: 92 reduction.dump(prep_data, to_child) ---> 93 reduction.dump(process_obj, to_child) 94 finally: 95 set_spawning_popen(None)

File ~\anaconda3\envs\talknet\lib\multiprocessing\reduction.py:60, in dump(obj, file, protocol) 58 def dump(obj, file, protocol=None): 59 '''Replacement for pickle.dump() using ForkingPickler.''' ---> 60 ForkingPickler(file, protocol).dump(obj)

PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.AudioTextEntity'>: attribute lookup AudioTextEntity on nemo.collections.common.parts.preprocessing.collections failed

rikabi89 commented 1 year ago

I get this error at Step 4 in the notebook for offline training. Any help please.