Fixed evaluation_script argparser and model_builder multi-dataset sampling

What is the purpose of this PR?

I had to change two small errors that I just realized during model training:

We use tf.data.Dataset.sample_from_datasets(datasets=train_datasets, weights=sample_probs, stop_on_empty_dataset=True) to sample from the datasets during multi-dataset training. Before we didn't set stop_on_empty_dataset=True which resulted in the dataloader to first sample from datasets X, Y, Z according to sample_probs until one or more of the datasets ran out of samples, then it continued to draw samples from the remaining datasets until all are empty and the dataloader is restarted. In this PR we set stop_on_empty_dataset=True which restarts the dataloader once a dataset is empty.
In evaluation_script.py we load external datasets for model evaluation. The argparsing functionality I implemented didn't work correctly, resulting in a failure to load these external datasets. This is now fixed.

How did you implement your changes

Changed only 2-3 lines of code..

Remaining issues

None

angelolab / Nimbus