Closed nelson-liu closed 7 years ago
This PR runs the evaluation of training and validating the attention sum reader on sciQ, and then evaluating on a test set.
This PR is ready to be merged as is, but it'd be nice to be able to do test set evaluation on the best epoch, as well as add this to the scala experiment code.
i think this can be merged as is. Travis pylint is failing, but it looks to be spurious and I can't figure out why it's complaining about that file.
I was wondering why I didn't get notified of this, and why it didn't get merged earlier - I guess you never added me as a reviewer! I'll look over it now.
I think the reason this didn't show up before was because we cache the conda environment, so we weren't pulling this new version. I don't think this behavior is desirable, so we can:
pip install -U
. I'm not sure how this interacts with the requirements.txt
file, as I'd like to pull the latest version of the packages that we haven't pinned but not upgrade the packages that we have pinned.Or we could say nevermind about the new spacy version, pin it to the old version and don't install data. I'm personally partial to 2, thoughts?
looks like -U
upgrades only if the requirement in requirements.txt
isn't fulfilled --- i'll go ahead and do that
This looks good to me. This does let us evaluate on the best model, by running training without test files, then by running test without training files, which will load the best model. Feel free to merge. I'll merge this soon if you haven't, unless there's something you still want to change.
That's true, but it'd be nice to do it all without manual intervention :)
This PR lets you pass in a
test_file
to evaluate on after training is completed.TODO: