bene-ges / nemo_compatible

useful things that work with NVIDIA NeMo library
Apache License 2.0
8 stars 1 forks source link

Windows not train #18

Open Vubni opened 5 months ago

Vubni commented 5 months ago

I do everything according to the instructions in train.sh , downloaded an archive with audio and marks.txt , the folder that was required and nemo, I run - and here is such a series of errors

PS E:\g++\синтез новый> sh train.sh
train.sh: line 1: #!/bin/bash: No such file or directory
train.sh: line 2: conda: command not found
fatal: destination path 'ru_g2p_ipa_bert_large' already exists and is not an empty directory.
Traceback (most recent call last):
  File "NeMo/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py", line 42, in <module>
    from helpers import ITN_MODEL, instantiate_model_and_trainer
  File "E:\g++\синтез новый\NeMo\examples\nlp\text_normalization_as_tagging\helpers.py", line 22, in <module>
    from nemo.collections.nlp.models import ThutmoseTaggerModel
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\__init__.py", line 15, in <module>
    from nemo.collections.nlp import data, losses, models, modules
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\data\__init__.py", line 42, in <module>
    from nemo.collections.nlp.data.zero_shot_intent_recognition.zero_shot_intent_dataset import (
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\data\zero_shot_intent_recognition\__init__.py", line 16, in <module>
    from nemo.collections.nlp.data.zero_shot_intent_recognition.zero_shot_intent_dataset import (
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\data\zero_shot_intent_recognition\zero_shot_intent_dataset.py", line 30, in <module>
    from nemo.collections.nlp.parts.utils_funcs import tensor2list
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\parts\__init__.py", line 17, in <module>
    from nemo.collections.nlp.parts.utils_funcs import list2str, tensor2list
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\parts\utils_funcs.py", line 28, in <module>
    from nemo.collections.nlp.modules.common.megatron.utils import erf_gelu
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\modules\__init__.py", line 16, in <module>
    from nemo.collections.nlp.modules.common import (
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\modules\common\__init__.py", line 36, in <module>
    from nemo.collections.nlp.modules.common.tokenizer_utils import get_tokenizer, get_tokenizer_list
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\modules\common\tokenizer_utils.py", line 29, in <module>
    from nemo.collections.nlp.parts.nlp_overrides import HAVE_MEGATRON_CORE
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\parts\nlp_overrides.py", line 31, in <module>
    from pytorch_lightning.overrides.base import _LightningModuleWrapperBase
ModuleNotFoundError: No module named 'pytorch_lightning.overrides.base'
Traceback (most recent call last):
  File "nemo_compatible/scripts/tts/ru_g2p_ipa/preprocess_text_before_tts.py", line 30, in <module>
    with open(args.g2p_name, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'all_words.g2p.txt'
Traceback (most recent call last):
  File "nemo_compatible/scripts/tts/utils/create_manifest_for_tts.py", line 17, in <module>
    with open(args.preprocessed_text_name, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'marks.g2p.txt'
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\scripts\dataset_processing\tts\nemo_compatible\scripts\tts\ru_ipa_fastpitch_hifigan\ds_conf' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
head: cannot open 'manifest.json' for reading: No such file or directory
TAIL: can't open 460
TAIL: can't open manifest.json
[NeMo W 2024-01-25 22:24:44 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\examples\tts\nemo_compatible\scripts\tts\ru_ipa_fastpitch_hifigan\conf' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[NeMo W 2024-01-25 22:24:56 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
usage: generate_mels.py [-h] --fastpitch-model-ckpt FASTPITCH_MODEL_CKPT --input-json-manifests INPUT_JSON_MANIFESTS
                        [INPUT_JSON_MANIFESTS ...] --output-json-manifest-root OUTPUT_JSON_MANIFEST_ROOT
                        [--num-workers NUM_WORKERS] [--cpu]
generate_mels.py: error: argument --fastpitch-model-ckpt: expected one argument
[NeMo W 2024-01-25 22:25:06 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\examples\tts\NeMo\examples\tts\conf\hifigan' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
bene-ges commented 5 months ago

hi @Vubni , I never tried it on Windows but concerning the reported error, maybe it's version mismatch between nemo and pytorch_lightning. See requirements in nemo, but check with your particular nemo version

Vubni commented 5 months ago

Thanks @bene-ges ! It really helped me get rid of that error, but after I ran into others and I don't understand how to solve them, I checked all the libraries, looked for a solution, but nothing

PS E:\g++\синтез новый> sh train.sh
train.sh: line 1: #!/bin/bash: No such file or directory
train.sh: line 2: conda: command not found
fatal: destination path 'ru_g2p_ipa_bert_large' already exists and is not an empty directory.
[NeMo W 2024-01-26 16:44:36 nemo_logging:349] C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\hydra\_internal\hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
    See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
      ret = run_job(

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
[NeMo I 2024-01-26 16:44:36 helpers:60] Restoring pretrained itn model from ru_g2p_ipa_bert_large/ru_g2p.nemo
[NeMo I 2024-01-26 16:44:37 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: DeepPavlov/rubert-base-cased, vocab_file: C:\Users\egora\AppData\Local\Temp\tmp0u87qzzr\c09b2638681e4862bdffa78433689e48_vocab.txt, merges_files: None, special_tokens_dict: {}, and use_fast: False
[NeMo W 2024-01-26 16:44:38 modelPT:251] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2024-01-26 16:44:38 nlp_overrides:454] Apex was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/apex
    Megatron-based models require Apex to function correctly.
[NeMo W 2024-01-26 16:44:38 nlp_overrides:462] megatron-core was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/NeMo#megatron-gpt.
[NeMo W 2024-01-26 16:44:38 lm_utils:91] DeepPavlov/rubert-base-cased is not in get_pretrained_lm_models_list(include_external=False), will be using AutoModel from HuggingFace.
Some weights of the model checkpoint at DeepPavlov/rubert-base-cased were not used when initializing BertModel: ['cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[NeMo W 2024-01-26 16:44:42 modelPT:251] You tried to register an artifact under config key=language_model.config_file but an artifact for it has already been registered.
[NeMo I 2024-01-26 16:44:42 save_restore_connector:249] Model ThutmoseTaggerModel was successfully restored from E:\g++\синтез новый\ru_g2p_ipa_bert_large\ru_g2p.nemo.
[NeMo I 2024-01-26 16:44:42 helpers:81] Model itn -- Device cuda:0
[NeMo I 2024-01-26 16:44:42 normalization_as_tagging_infer:59] Running inference on all_words.txt...
Error executing job with overrides: ['pretrained_model=ru_g2p_ipa_bert_large/ru_g2p.nemo', 'inference.from_file=all_words.txt', 'inference.out_file=all_words.g2p.txt', 'model.max_sequence_len=64', 'inference.batch_size=512', 'lang=ru']
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
  File "NeMo/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py", line 73, in main
    outputs = model._infer(batch)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\models\text_normalization_as_tagging\thutmose_tagger.py", line 310, in _infer
    batch = next(iter(infer_datalayer))
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 386, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1039, in __init__
    w.start()
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
NotImplementedError: object proxy must define __reduce_ex__()

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Traceback (most recent call last):
  File "nemo_compatible/scripts/tts/ru_g2p_ipa/preprocess_text_before_tts.py", line 30, in <module>
    with open(args.g2p_name, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'all_words.g2p.txt'
Traceback (most recent call last):
  File "nemo_compatible/scripts/tts/utils/create_manifest_for_tts.py", line 17, in <module>
    with open(args.preprocessed_text_name, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'marks.g2p.txt'
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\scripts\dataset_processing\tts\nemo_compatible\scripts\tts\ru_ipa_fastpitch_hifigan\ds_conf' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
head: cannot open 'manifest.json' for reading: No such file or directory
TAIL: can't open 460
TAIL: can't open manifest.json
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\examples\tts\nemo_compatible\scripts\tts\ru_ipa_fastpitch_hifigan\conf' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
usage: generate_mels.py [-h] --fastpitch-model-ckpt FASTPITCH_MODEL_CKPT --input-json-manifests INPUT_JSON_MANIFESTS
                        [INPUT_JSON_MANIFESTS ...] --output-json-manifest-root OUTPUT_JSON_MANIFEST_ROOT
                        [--num-workers NUM_WORKERS] [--cpu]
generate_mels.py: error: argument --fastpitch-model-ckpt: expected one argument
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\examples\tts\NeMo\examples\tts\conf\hifigan' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace

As I understand it, the initialization threshold has been passed, but training causes errors

bene-ges commented 5 months ago

@Vubni this is some error with multiprocessing - I don't know how to solve it. Look at this discussion in Nemo - maybe try WSL on Windows?.

bene-ges commented 5 months ago

also see this (suggests a patch for similar error) https://github.com/NVIDIA/NeMo/discussions/5492