Cleaned and squashed merge of @ge-dong fork

Replaces and closes #49. From @ge-dong's changes:

Adds an expanded D3D shot list, courtesy of C. Rea.
Adds MPI-distributed PyTorch driver
Defines two new signal sets that tolerate early terminations of signal channels, d3d_signals_max_tol and d3d_signals_garbage

In addition:

Removes data/shot_lists/ from version control. The *.txt files were unused by FRNN when stored here and were slightly out of date with the master copies stored alongside the data on Tigress.
Changed ownership file permissions of the shot list files in the shared project directory /tigress/FRNN/shot_lists/ from pfan to my username kfelker
Renamed the expanded shot lists to d3d_clear/disrupt_since_2016.txt from shots_since_2016_clear/disrupt.txt.
Renamed expanded shot list variable in conf_parser.py from d3d_full_new to d3d_full_2019
Assertions are keywords, not functions, in Python. assert x not assert(x)

Possible to-do:

[ ] Clean up /tigress/FRNN/shot_lists/ , possibly organize into subdirectories?
[ ] Deduplicate code in torch_runner_dist.py and other models/ files.

Added descriptive comments in signals.py and conf_parser.py:

# ------------ signals.py:
# The "data_avail_tolerances" parameter in Signal class initializer relaxes
# the cutoff for the signal around the defined t_disrupt (provided in the
# disruptive shot list). The latter definition (based on current quench) may
# vary depending on who supplied the shot list and computed t_disrupt, since
# quench may last for O(10 ms). E.g. C. Rea may have taken t_disrupt = midpoint
# of start and end of quench for later D3D shots after 2016 in
# d3d_disrupt_since_2016.txt. Whereas J. Barr, and semi-/automatic methods for
# calculating t_disrupt may use t_disrupt = start of current quench.

# Early-terminating signals will be implicitly padded with zeros when t_disrupt
# still falls within the tolerance (see shots.py,
# Shot.get_signals_and_times_from_file). Even tols > 30 ms are fine (do not
# violate causality), but the ML method may start to base predictions on the
# disappearance of signals.

# "t" subscripted variants of signal variables increase the tolernace to 29 ms
# on D3D, the maximum value possible without violating causality for the min
# T_warn=30 ms. This is important for the signals of newer shots in
# d3d_disrupt_since_2016.txt; many of them would cause [omit] of entire shot
# because their values end shortly before t_disrupt (poss. due to different
# t_disrupt label calculation).

# See conf_parser.py dataset definitions of d3d_data_max_tol, d3d_data_garbage
# which use these signal variants.

# For non-t-subscripted profile signals (and q95), a positive tolerance of
# 20ms on D3D (and 30-50ms on JET) is used to account for the causal shifting
# of the delayed "real-time processing".

# List ---> individual tolerance for each machine when signal definitions are
# shared in cross-machine studies.

# -------------- conf_parser.py

        # See notes in data/signals.py for details on signal tols relative to
        # t_disrupt. The following 2x dataset definitions permit progressively
        # worse signal quality when preprocessing the shots and omitting some
        if params['paths']['data'] == 'd3d_data_max_tol':
            # let signals terminate up to 29 ms before t_disrupt on D3D
            h = myhash_signals(sig.all_signals_max_tol.values())
        elif params['paths']['data'] == 'd3d_data_garbage':
            # let up to 3x signals disappear at any time before t_disrupt
            # (and NaNs?)
            # -----
            # temp workaround for identical signal dictionary (but different
            # omit criteria in shots.py Shot.get_signals_and_times_from_file())
            # ---> 2x hash int
            # TODO(KGF): not robust; create reproducible specification and
            # recording of signal filtering procedure
            h = myhash_signals(sig.all_signals_max_tol.values())*2

PPPLDeepLearning / plasma-python

Cleaned and squashed merge of @ge-dong fork #50