deepgram / kur

Descriptive Deep Learning
Apache License 2.0
814 stars 107 forks source link

RuntimeError: generator raised StopIteration #107

Open chrisspen opened 4 years ago

chrisspen commented 4 years ago

Attempting to run the example kur -v train speech.yml ends after about an hour with the error:

Traceback (most recent call last):
  File "~/myproject/env/bin/kur", line 8, in <module>
    sys.exit(main())
  File "~/myproject/env/lib/python3.7/site-packages/kur/__main__.py", line 492, in main
    sys.exit(args.func(args) or 0)
  File "~/myproject/env/lib/python3.7/site-packages/kur/__main__.py", line 64, in train
    func(step=args.step)
  File "~/myproject/env/lib/python3.7/site-packages/kur/kurfile.py", line 434, in func
    return trainer.train(**defaults)
  File "~/myproject/env/lib/python3.7/site-packages/kur/model/executor.py", line 295, in train
    **kwargs
  File "~/myproject/env/lib/python3.7/site-packages/kur/model/executor.py", line 768, in wrapped_train
    for num_batches, batch in parallelize(enumerate(provider)):
RuntimeError: generator raised StopIteration

and no weights file is output anywhere. Is this an bug or the expected behavior?

greedyuser commented 4 years ago

Chris, The full spec is:

  weights:
    initial: "/path/to/load/first/weights"
    last: "/path/to/save/when/exiting"

  checkpoint:
    path: "/path/to/save/checkpoints"
    minutes: "{{ checkpoint_time }}"
    validation: "{{ batch size }}"

Are you using the default speech.yml in your tests?

chrisspen commented 4 years ago

Yes. I see no weights or checkpoint sections in your speech.yml file. Where are those defined?

greedyuser commented 4 years ago

When you look in the 'train' section you see a 'weights' section that imports in the weights from the 'settings' section:

 train:

   data:
     # A "speech_recognition" data supplier will create these data sources:
     #   utterance, utterance_length, transcript, transcript_length, duration
     - speech_recognition:
         <<: *data
         url: "https://kur.deepgram.com/data/lsdc-train.tar.gz"
         checksum: >-
           fc414bccf4de3964f895eaa9d0e245ea28810a94be3079b55505cf0eb1644f94
   weights: *weights

If you replace that with what I have above it should save checkpoints and a 'last' set of weights when the model stops. Paths can be relative so naming things like 'last.model.kur' and 'checkpoint.model.kur' allows you to just create a new directory, copy in the yml files and execute if you want to start a new model.

You also need to make any similar changes to the 'validate' section.

chrisspen commented 4 years ago

If I make those changes, then kur doesn't run at all and fails with the exception:

Traceback (most recent call last):
  File "~/.env/bin/kur", line 8, in <module>
    sys.exit(main())
  File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 492, in main
    sys.exit(args.func(args) or 0)
  File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 62, in train
    spec = parse_kurfile(args.kurfile, args.engine)
  File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 48, in parse_kurfile
    spec.parse()
  File "~/.env/lib/python3.7/site-packages/kur/kurfile.py", line 129, in parse
    self.engine, builtin['train'], stack, include_key=True)
  File "~/.env/lib/python3.7/site-packages/kur/kurfile.py", line 960, in _parse_section
    evaluated = engine.evaluate(self.data[key], recursive=True)
  File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 228, in evaluate
    for k, v in expression.items()}
  File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 228, in <dictcomp>
    for k, v in expression.items()}
  File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 228, in evaluate
    for k, v in expression.items()}
  File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 228, in <dictcomp>
    for k, v in expression.items()}
  File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 208, in evaluate
    new_expression = self._evaluate(expression)
  File "~/.env/lib/python3.7/site-packages/kur/engine/jinja_engine.py", line 189, in _evaluate
    result = self.env.from_string(expression).render(**self._scope)
  File "~/.env/lib/python3.7/site-packages/jinja2/environment.py", line 880, in from_string
    return cls.from_code(self, self.compile(source), globals, None)
  File "~/.env/lib/python3.7/site-packages/jinja2/environment.py", line 591, in compile
    self.handle_exception(exc_info, source_hint=source_hint)
  File "~/.env/lib/python3.7/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "~/.env/lib/python3.7/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "<unknown>", line 1, in template
  File "~/.env/lib/python3.7/site-packages/jinja2/environment.py", line 497, in _parse
    return Parser(self, source, name, encode_filename(filename)).parse()
  File "~/.env/lib/python3.7/site-packages/jinja2/parser.py", line 901, in parse
    result = nodes.Template(self.subparse(), lineno=1)
  File "~/.env/lib/python3.7/site-packages/jinja2/parser.py", line 876, in subparse
    self.stream.expect('variable_end')
  File "~/.env/lib/python3.7/site-packages/jinja2/lexer.py", line 384, in expect
    self.name, self.filename)
jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got 'size'

Why would the default speech.yml need modifications just to save the trained network? Shouldn't those settings be the default?

chrisspen commented 4 years ago

I suspect instead of:

validation: "{{ batch size }}"

you meant to type:

validation: "{{ batch_size }}"

That lets it run for a short while, but then it still errors with:

Traceback (most recent call last):
  File "~/.env/bin/kur", line 8, in <module>
    sys.exit(main())
  File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 492, in main
    sys.exit(args.func(args) or 0)
  File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 64, in train
    func(step=args.step)
  File "~/.env/lib/python3.7/site-packages/kur/kurfile.py", line 434, in func
    return trainer.train(**defaults)
  File "~/.env/lib/python3.7/site-packages/kur/model/executor.py", line 295, in train
    **kwargs
  File "~/.env/lib/python3.7/site-packages/kur/model/executor.py", line 564, in wrapped_train
    checkpoint[k]))
ValueError: Expected "minutes" key in "checkpoint" to be an integer. Received: 

Are you sure minutes: "{{ checkpoint_time }}" is the correct syntax and variable name?

greedyuser commented 4 years ago

Sorry about the typo there! I should have told you to define batch_size and checkpoint_time in the settings section. They can be literals, but for clarity I would recommend defining them in settings:

  batch_size: 16
  checkpoint_time: 30

Of course, select whatever batch size and checkpoint you need here.