PPPLDeepLearning / plasma-python

PPPL deep learning disruption prediction package
http://tigress-web.princeton.edu/~alexeys/docs-web/html/
79 stars 43 forks source link

Cleaned and squashed merge of @ge-dong fork #50

Closed felker closed 4 years ago

felker commented 4 years ago

Replaces and closes #49. From @ge-dong's changes:

In addition:

Possible to-do:

Added descriptive comments in signals.py and conf_parser.py:

# ------------ signals.py:
# The "data_avail_tolerances" parameter in Signal class initializer relaxes
# the cutoff for the signal around the defined t_disrupt (provided in the
# disruptive shot list). The latter definition (based on current quench) may
# vary depending on who supplied the shot list and computed t_disrupt, since
# quench may last for O(10 ms). E.g. C. Rea may have taken t_disrupt = midpoint
# of start and end of quench for later D3D shots after 2016 in
# d3d_disrupt_since_2016.txt. Whereas J. Barr, and semi-/automatic methods for
# calculating t_disrupt may use t_disrupt = start of current quench.

# Early-terminating signals will be implicitly padded with zeros when t_disrupt
# still falls within the tolerance (see shots.py,
# Shot.get_signals_and_times_from_file). Even tols > 30 ms are fine (do not
# violate causality), but the ML method may start to base predictions on the
# disappearance of signals.

# "t" subscripted variants of signal variables increase the tolernace to 29 ms
# on D3D, the maximum value possible without violating causality for the min
# T_warn=30 ms. This is important for the signals of newer shots in
# d3d_disrupt_since_2016.txt; many of them would cause [omit] of entire shot
# because their values end shortly before t_disrupt (poss. due to different
# t_disrupt label calculation).

# See conf_parser.py dataset definitions of d3d_data_max_tol, d3d_data_garbage
# which use these signal variants.

# For non-t-subscripted profile signals (and q95), a positive tolerance of
# 20ms on D3D (and 30-50ms on JET) is used to account for the causal shifting
# of the delayed "real-time processing".

# List ---> individual tolerance for each machine when signal definitions are
# shared in cross-machine studies.

# -------------- conf_parser.py

        # See notes in data/signals.py for details on signal tols relative to
        # t_disrupt. The following 2x dataset definitions permit progressively
        # worse signal quality when preprocessing the shots and omitting some
        if params['paths']['data'] == 'd3d_data_max_tol':
            # let signals terminate up to 29 ms before t_disrupt on D3D
            h = myhash_signals(sig.all_signals_max_tol.values())
        elif params['paths']['data'] == 'd3d_data_garbage':
            # let up to 3x signals disappear at any time before t_disrupt
            # (and NaNs?)
            # -----
            # temp workaround for identical signal dictionary (but different
            # omit criteria in shots.py Shot.get_signals_and_times_from_file())
            # ---> 2x hash int
            # TODO(KGF): not robust; create reproducible specification and
            # recording of signal filtering procedure
            h = myhash_signals(sig.all_signals_max_tol.values())*2