Reduce code redundancy - Githubissues

A few issues we should address:

Redundant argument parsers throughout the codebase. I think moving to a hierarchical JSON config system makes sense, where we have a single global configuration for all the universal settings, like the learning rate, weight decay, and optimizer, and then inheriting, model-specific configurations. Does that seem like a sane choice?
Redundant LSTM baseline and regularization models. Is there a reason we need two models (both named LSTMBaseline)? Isn't LSTM-reg just LSTM-baseline with regularization?
Other obviously redundant code [1] [2].

castorini / castor