asappresearch / flambe

An ML framework to accelerate research and its path to production.
https://flambe.ai
MIT License
262 stars 29 forks source link

Generating configs from templates #12

Closed ghost closed 5 years ago

ghost commented 5 years ago

Is your feature request related to a problem? Please describe.

Yes. I'm trying to dynamically inject pathnames into a config, then run an experiment using that config. Out of the box, there is no clean way to do this.

Describe the solution you'd like

A function that accepts a path to a Jinja2-templated flambe config, an output path, and key:val pairs to inject.

Describe alternatives you've considered

Loading the config into memory with YAML or flambe-YAML tools, editing in memory, then writing a new config to disk.

Additional context

Here's what I've come up with:

import os
import re

import jinja2

def generate_config_from_template(template_path, config_path, remove_comments=False, **template_kwargs):
    dirname = os.path.dirname(template_path)
    basename = os.path.basename(template_path)
    loader = jinja2.FileSystemLoader(searchpath=dirname)
    env = jinja2.Environment(loader=loader)
    template = env.get_template(basename)
    with open(config_path, 'w') as f:
        for line in template.render(**template_kwargs).split('\n'):
            if remove_comments:
                line = re.sub('# .*', '', line).rstrip()
            if line:
                f.write(line + '\n')

Where a config template might look like:

post_process_preds: 'post_process_preds_ext'
---
!Experiment

name: ada-text-classification
pipeline:

  # stage 0 - Load the dataset object SSTDataset and run preprocessing
  0_dataset: !Foo
    train_path: {{ train_path }}
    test_path: {{ test_path }}
    transform:
      text: !TextField
      label: !LabelField

  # stage 1 - train the text classifier on the SSTDataset
  1_train: !Trainer
    dataset: !@ 0_dataset  # link back to the existing dataset
    train_sampler: !BaseSampler  # define a way of sampling dataset
    val_sampler: !BaseSampler
    model: !TextClassifier
      embedder: !Embedder
        embedding: !torch.Embedding  # automatically use pytorch classes
          num_embeddings: !@ 0_dataset.text.vocab_size  # reference vocab size
          embedding_dim: 300
        encoder: !PooledRNNEncoder
          input_size: 300
          rnn_type: lstm
          n_layers: !g [2]
          hidden_size: 256
      output_layer: !SoftmaxLayer
        input_size: !@ 1_train.model.embedder.encoder.rnn.hidden_size
        output_size: !@ 0_dataset.label.vocab_size
        take_log: false
    loss_fn: !torch.NLLLoss  # Use existing PyTorch negative log likelihood
    metric_fn: !torch.NLLLoss  # Used for validation set evaluation
    optimizer: !torch.Adam
      params: !@ 1_train.model.trainable_params  # Link to model parameters
    max_steps: 2  # Each step runs `iter_per_step` iterations
    iter_per_step: 2  # Eval and checkpoint every 50 iterations
  2_eval: !Evaluator
    dataset: !@ 0_dataset
    model: !@ 1_train.model
    metric_fn: !torch.NLLLoss
    output_path: {{ preds_path }}
    eval_sampler: !BaseSampler
    eval_data: test
  3_post_process_preds: !post_process_preds.PostProcessPreds
    preds_path: !@ 2_eval.output_path
    preds_id_path: {{ preds_id_path }}
    post_processed_preds_path: {{ post_processed_preds_path }}
    label_vocab: !@ 0_dataset.label.vocab
jeremyasapp commented 5 years ago

The big question for me here is: do we need to add any code to Flambé? Seems like we can just provide a small tutorial that explains how you can use Jinja to do this

jeremyasapp commented 5 years ago

I like the idea of templating, but I don't really think this requires additional code, which is kind of cool :)

ghost commented 5 years ago

Why not include the function I wrote above?

All of the code is super boilerplate. Plus, you can control how the config is written back to disk (removing spaces, comments, etc.).

For me, the question is: does this help people to get started with/fall more in love with flambé, and at what cost?

The answer is: yes, and at a very small one of a single helper.

jeremyasapp commented 5 years ago

Whether we add this snippet to the code or the docs what does it change?

jeremyasapp commented 5 years ago

They will still need to run a script on their template

jeremyasapp commented 5 years ago

Just to clarify I do want to keep your snippet! I just think it's so small it can just go in a tutorial directly. Do you think there is a strong reason to put it in the repo? If so, we can PR it in flambe.utils :)

nmatthews-asapp commented 5 years ago

I think it would be fine to add to utils - but there is one cost: we add jinja as a dependency

jeremyasapp commented 5 years ago

@nmatthews-asapp it's already a flask dependency, which is a dependency of ours (at least until we make the website a different repo or something)

jeremyasapp commented 5 years ago

@williamabrwolf want to make a PR into flambe.utils?