Closed AbeHandler closed 5 years ago
Thanks for brining this to our attention! Minor issue 1 is now taken care of.
As for the second issue, this is a great point. I believe it's intended that you coordinate between preprocessed data and different environments explicitly; perhaps we can add a way to override that from the command line.
The second issue is addressed by #34 , so closing this!
I am excited to try out this method! I found two small issues trying to change the vocab size with the current code.
Minor issue 1:
$git reset --hard origin/master && python -m scripts.make_reference_corpus examples/ag/dev.jsonl examples/ag/reference --vocab-size 1000
I get an error "TypeError: '>=' not supported between instances of 'str' and 'int'"
I think the issue is just that you need to specify the type of the vocab_size argument in
scripts/make_reference_corpus.py
.Changing line 49 of
scripts/make_reference_corpus.py
toparser.add_option('--vocab-size', dest='vocab_size', default=None,**type=int**
fixes the error for me.Minor issue 2:
I seems like the vocab size is hardcoded into the VAMPIRE environment. So if you run
python -m scripts.train
after preprocessing with a vocabulary size that is not 30K you will get tensor mismatch errors from torch.https://github.com/allenai/vampire/blob/d3662bdf8971e961076d799536a05a0a9c397536/environments/environments.py#L66
Happy to put in a PR if that is helpful. But that is maybe overkill.