Convergence cutoff, hyper-parameter tuning, and more...

This pull request has several changes that have been developed over the last quarter. I anticipate this will require a few eyes to test things out. Some comments on changes here:

Issue 162 - There are changes to the workflows to address convergence cutoffs. These are configurable by new options in the configuration files (cutoff and rolling_reward_length). These can be used to specify a minimum change in average total reward over the last N steps
Issue 166 and Issue 232 - The step count in the workflows (sync and async) was across two variables and had a off by one counting issue. We moved this to just one variable and placed it where it is intentionally.
Fixed the results directory - We noticed some weird overwriting or including old data in rolling reward plots across runs. This is because the log/results are always stored in EXP001/RUN001. We have fixed it so we increment the run directory.
Tensorflow and Pytorch - We made sure the code can run with both Tensorflow and Pytorch. This required some changes to make sure that Tensorflow isn't loaded when we want Pytorch. To use pytorch, we change the imports in the main driver. Currently our repo does not have any Pytorch code...
We have added a hyper-parameter tuning script. This requires Optuna and sbatch. Optuna can run tests in parallel and we use sbatch to do this. This could be extended further for other schedulers.
Other quality of life updates. There were some spelling errors and options missed in the bsuite related scripts.

I am sure this is not 100% ready to merge, but I want to get people looking at it sooner rather than later.

exalearn / EXARL

Convergence cutoff, hyper-parameter tuning, and more... #237