HARPgroup / HSPsquared

Hydrologic Simulation Program Python (HSPsquared)
GNU Affero General Public License v3.0
1 stars 0 forks source link

dataMatrix #26

Open rburghol opened 1 year ago

rburghol commented 1 year ago

Flexible lookup tables.

Testing

Conceptual Design + Config/Optimization

Data Model

See also: https://github.com/HARPgroup/HARParchive/issues/237

Class

class dataMatrix(modelObject):
    lu_type1 = ""
    matrix = [] # gets passed in at creation.  Refers to path "/OBJECTS/DataMatrix/RCHRES_0001/stage_storage_discharge/matrix"
    def ncode_step(self):
        # return string with code that is executable in @njit function with ts and object state variables
        code_lines = []
        if self.lu_type1 == "":
            # this is only sample code of what a sample lookup might look like
            # note: three consecutive quote marks (""") tell python that this is a multi-line string
            code_lines.append("""OBJECTS_DATAMTRIX_0001 = array([[0.000000e+00, 1.700000e+02, 0.000000e+00],
                [1.952000e+05, 2.400000e+02, 8.890600e+03],
                [2.042528e+05, 2.410000e+02, 9.301500e+03],
                [2.137360e+05, 2.420000e+02, 9.712500e+03]])""")
            code_lines.append('ts["/RESULTS/RCHRES_001/SPECL/elev"][step] = np.interp(ts["/RESULTS/RCHRES_001/SPECL/Volume"][step],OBJECTS_DATAMATRIX_0001[:, 0][0:], OBJECTS_DATAMATRIX_0001[:, 1][0:])')

        return code_lines 

Parser

Validation:

import numpy as np
from numba import int8, float32    # import the types

def parseMatrix(matrix_text):
    matrix_eval = eval(matrix_text)
    m_type = type(matrix_eval ).__name__
    if m_type == "list":
        return np.asarray(matrix_eval, dtype="float32")
    else:
        print(m_type)
        return False

#dmp = parseMatrix("[ [1, 2, 3], [\"Qin\", \"b\", \"c\"] ]")
storage_table = parseMatrix("[ [ 0.0, 170.0, 0], [195200.0, 240.0, 8890.6], [204252.8, 241.0, 9301.5], [213736.0, 242.0, 9712.5] ]")

Convert JSON to evaluatable code

Handling this work flow. Example dataMatrix:

JSON 1: flowby_tiers_cond3.

import json
from io import StringIO
ijson = '[["0", "Qin"],["5.0", "flowby_cond3"],["1000000.0", "flowby_cond3"]]'
ijson = ijson.replace('"', '')
ijson = ijson.replace("'", '')
ijson
# '[[0 Qin][5.0 flowby_cond3][1000000.0 flowby_cond3]]'

### Solver: 
- see: https://numpy.org/doc/stable/reference/generated/numpy.interp.html 
- https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
- https://numba.pydata.org/numba-doc/latest/reference/types.html
- https://medium.com/pythoneers/enhance-python-performance-with-cython-numba-and-eval-fdd4b34c777c
- See also: https://shocksolution.com/2008/12/11/a-lookup-table-for-fast-python-math/

#### Linear Interpolation
- 1 dimensional, so, only a single column of numbers, 
- Or, when you know the index of the column you want, like in a storage/stage/area table

show stage at storage volume of 180,000 ac-ft

np.interp(180000,storage_table[:, 0][0:], storage_table[:, 1][0:])

234.54918032786884

Show Storage at volume 180,000 ac-ft (identity, should be equal)

np.interp(180000,storage_table[:, 0][0:], storage_table[:, 0][0:])

180000.0

show surface area at 180,000 ac-ft storage volume

np.interp(180000,storage_table[:, 0][0:], storage_table[:, 2][0:])

8198.29918032787

#### 2-d Linear
- Use 1-d interp to get interps for all y columns
- This is not yet done
- Get row index with 1-d
- Then get col index with contents of row (need syntax for that)

#### Nearest key interp
- Based on code found at https://philbull.wordpress.com/2012/01/11/numpy-tip-getting-index-of-an-array-element-nearest-to-some-value/

v = 180000 idx = (np.abs(storage_table[:, 0][0:]- v)).argmin()

show surface area at this point

storage_table[:, 2][0:][idx]

8890.6


#### Stair-step interp
- Adapted to extend code found at https://philbull.wordpress.com/2012/01/11/numpy-tip-getting-index-of-an-array-element-nearest-to-some-value/

v = 210000 idx = (storage_table[:, 0][0:][(storage_table[:, 0][0:]- v) <= 0]).argmax()

show surface area at this point

storage_table[:, 2][0:][idx]

show elevation at this point

storage_table[:, 1][0:][idx]

rburghol commented 1 year ago

Example: Key value interpolation.

people = [
                  {
                      'name': 'Nate',
                      'age': 100,
                      'favorite_color': 'blue',
                      'salary': 250000.19
                  },
                  {
                      'name': 'George',
                      'age': 29,
                      'favorite_color': 'green',
                      'salary': 59000.50
                  },
                  {
                      'name': 'Wendy',
                      'age': 63,
                      'favorite_color': 'red',
                      'salary': 1000000.64
                  },
                  {
                      'name': 'Amy',
                      'age': 47,
                      'favorite_color': 'brown',
                      'salary': 100000.64
                  }
]
for person in people:
    print(f"{person['name']}'s salary is {person['salary']} and their favorite color is {person['favorite_color']}.")
rburghol commented 1 year ago

Moved into issue body above under Parser

rburghol commented 1 year ago

moved https://github.com/HARPgroup/HSPsquared/issues/19

rburghol commented 1 year ago

psuedo code:

for i in specl_keys:
    specl_code[i] = f"specl_var[i]['code']"
    if specl_var[i]['debug'] == 1:
        ts[specl_var[i]['debug_path'] = specl_code[i]
    ts[specl_var[i]['path']] = eval(specl_code[i])
rburghol commented 1 year ago

Single timestep in object idea: see https://github.com/HARPgroup/HSPsquared/issues/35#issuecomment-1314088026

rburghol commented 1 year ago

moved: https://github.com/HARPgroup/HSPsquared/issues/35#issuecomment-1314092462

rburghol commented 1 year ago

Try functional

import numpy as np
def parseMatrix(matrix_text):
    matrix_eval = eval(matrix_text)
    m_type = type(matrix_eval ).__name__
    if m_type == "list":
        return np.asarray(matrix_eval)
    else:
        print(m_type)
        return False

storage_table = parseMatrix("[ [ 0.0, 170.0, 0], [195200.0, 240.0, 8890.6], [204252.8, 241.0, 9301.5], [213736.0, 242.0, 9712.5] ]")

@njit
def specl_lookup(data_table, keyval, lutype, valcol):
    if lutype == 2: #stair-step
        idx = (data_table[:, 0][0:][(data_table[:, 0][0:]- keyval) <= 0]).argmax()
        luval = storage_table[:, valcol][0:][idx]
    elif lutype == 1: # interpolate
        luval = np.interp(keyval,storage_table[:, 0][0:], storage_table[:, valcol][0:])

    # show value at tis point
    return luval

# test with dedicated njit function that takes parameters ONLY
# using stair step
start = time.time()
for step in range(steps):
    value = specl_lookup(storage_table, 200000, 2, 2)

end = time.time()
print(end - start, "seconds")
# 0.28759074211120605 seconds

# test with dedicated njit function that takes parameters  
# AND set value in the ts array. Note: setting ts increases time by 20x
# conclude that most time-consuming aspect is setting data in the ts Dict outside of jit
# using stair step
start = time.time()
for step in range(steps):
    ts['/RESULTS/RCHRES_001/SPECL/Volume'][step] = ts['/RESULTS/RCHRES_001/SPECL/Qin'][step] * 213736.0
    ts['/RESULTS/RCHRES_001/SPECL/elev'][step] = specl_lookup(storage_table, 200000, 2, 2)

end = time.time()
print(end - start, "seconds")
# 5.631542921066284 seconds

# now test with interpolated
start = time.time()
for step in range(steps):
    ts['/RESULTS/RCHRES_001/SPECL/Volume'][step] = ts['/RESULTS/RCHRES_001/SPECL/Qin'][step] * 213736.0
    ts['/RESULTS/RCHRES_001/SPECL/elev'][step] = specl_lookup(storage_table, 200000, 1, 2)

end = time.time()
print(end - start, "seconds")

# a tad bit slower
# 6.068679571151733 seconds

@njit
def specl_lookup_write(ts, step, data_table, keyval, lutype, valcol):
    if lutype == 2: #stair-step
        idx = (data_table[:, 0][0:][(data_table[:, 0][0:]- keyval) <= 0]).argmax()
        luval = storage_table[:, valcol][0:][idx]
    elif lutype == 1: # interpolate
        luval = np.interp(keyval,data_table[:, 0][0:], data_table[:, valcol][0:])

    # write and retrurn value at this point
    ts['/RESULTS/RCHRES_001/SPECL/elev'][step] = specl_lookup(data_table, 200000, 1, 2)
    return luval

# Run with njit write
start = time.time()
for step in range(steps):
    val = specl_lookup_write(ts, step, storage_table, 200000, 1, 2)

end = time.time()
print(end - start, "seconds")
# 1.1247680187225342 seconds
rburghol commented 1 year ago

Test with @jitclass

Initialize Data sets

storage_table = np.asarray([ [ 0.0, 170.0, 0], [195200.0, 240.0, 8890.6], [204252.8, 241.0, 9301.5], [213736.0, 242.0, 9712.5] ], dtype= "float32") steps = 365 24 50# 50 year hourly simulation ts = Dict.empty(key_type=types.unicode_type, value_type=types.float64[:])

set up a place to store the function output

ts['/RESULTS/RCHRES_001/SPECL/elev'] = zeros(steps) Qrandom = np.random.randint(0, 20000, size=steps, dtype='l') Qrandom = Qrandom.astype('float64') ts['/RESULTS/RCHRES_001/SPECL/Volume'] = zeros(steps) + Qrandom + 190000

Object definition

spec = [ ('lutype1', int8), # a simple scalar field ('lutype2', int8), # a simple scalar field ('matrix', float32[:,:]), # an array field ('value', float32), # a variable holding the evaluation of this object ('keyval', float32), # a variable holding the key for lookup ('valcol', int8), # which column to use for the default evaluate() result ('datapath', types.string), # path in hdf5 ('lukey1path', types.string), # path in hdf5 ('lukey2path', types.string), # path in hdf5 ] @jitclass(spec) class XdataMatrix(object): def init(self, lutype1, lutype2, matrix): self.lutype1 = lutype1 self.lutype2 = lutype2 self.matrix = matrix

def evalSet(self, ts, step):
    luval = ts[self.lukey1path][step]
    self.value = self.lookup(luval, self.valcol)
    ts[self.datapath][step] = self.value

def evaluate(self):
    self.value = self.lookup(self.keyval, self.valcol)

def lookup(self, keyval, valcol):
    if self.lutype1 == 2: #stair-step
        idx = (self.matrix[:, 0][0:][(self.matrix[:, 0][0:]- keyval) <= 0]).argmax()
        luval = self.matrix[:, valcol][0:][idx]
    elif self.lutype1 == 1: # interpolate
        luval = np.interp(keyval,self.matrix[:, 0][0:], self.matrix[:, valcol][0:])

    return luval
rburghol commented 1 year ago

Test as pre-generated list of statements

ts['/RESULTS/RCHRES_001/SPECL/elev'] = zeros(steps)
ts['/RESULTS/RCHRES_001/SPECL/Volume'] = zeros(steps)  + 200000

@njit
def iterate_matrix(ts, matrix, step):
    ts['/RESULTS/RCHRES_001/SPECL/elev'][step] = np.interp(ts['/RESULTS/RCHRES_001/SPECL/Volume'][step],matrix[:, 0][0:], matrix[:, 1][0:])

start = time.time()
for step in range(steps):
    iterate_matrix(ts,storage_table, step)

end = time.time()
print(end - start, "seconds")

# first time
1.977916955947876 seconds
# 2nd time
0.9941999912261963 seconds
rburghol commented 1 year ago

Test in plain (no numba) functions

dtest = XdataMatrix(2,0,storage_table)
dtest.lookup(200000, 2)

# test as stair-step lookup
start = time.time()
for step in range(steps):
    ts['/RESULTS/RCHRES_001/SPECL/Volume'][step] = ts['/RESULTS/RCHRES_001/SPECL/Qin'][step] * 213736.0
    ts['/RESULTS/RCHRES_001/SPECL/elev'][step] = dtest.evaluate(ts['/RESULTS/RCHRES_001/SPECL/Volume'][step], 2)

end = time.time()
print(end - start, "seconds")
# 7.827843904495239 seconds

# test as interpolated lookup 
dtest.lutype1 = 1 
start = time.time()
for step in range(steps):
    ts['/RESULTS/RCHRES_001/SPECL/Volume'][step] = ts['/RESULTS/RCHRES_001/SPECL/Qin'][step] * 213736.0
    ts['/RESULTS/RCHRES_001/SPECL/elev'][step] = dtest.evaluate(ts['/RESULTS/RCHRES_001/SPECL/Volume'][step], 2)

end = time.time()
print(end - start, "seconds")
# little bit slower
# 7.882946014404297 seconds

# wrap the iteration in an @njit function
@njit
def iterate_object(oneobject, ts, steps):
    for step in range(steps):
        ts['/RESULTS/RCHRES_001/SPECL/Volume'][step] = ts['/RESULTS/RCHRES_001/SPECL/Qin'][step] * 213736.0
        ts['/RESULTS/RCHRES_001/SPECL/elev'][step] = oneobject.lookup(ts['/RESULTS/RCHRES_001/SPECL/Volume'][step], 2)

# test as interpolated lookup 
dtest.lutype1 = 1 
start = time.time()
iterate_object(dtest, ts, steps)
end = time.time()
print(end - start, "seconds")
# first time
# 1.3922579288482666 seconds
# 2nd time
# 0.48339414596557617 seconds

start = time.time() iterate_loop_write(dtest, ts, steps) end = time.time() print(end - start, "seconds")

0.26541876792907715 seconds


- test eval + write inside object
  - 0.18/0.26 (stair-step/interpolate)

test eval + write inside object

wrap the iteration in an @njit function

@njit def iterate_write(oneobject, ts, steps): for step in range(steps): oneobject.evalSet(ts, step)

dtest = XdataMatrix(2,0,storage_table) dtest.matrix = storage_table dtest.keyval = 200000 dtest.valcol = 2 dtest.lutype1 = 1 # interpolate, which takes about 1.5x as long stair-step dtest.datapath = '/RESULTS/RCHRES_001/SPECL/elev' dtest.lukey1path = '/RESULTS/RCHRES_001/SPECL/Volume'

start = time.time() iterate_write(dtest, ts, steps) end = time.time() print(end - start, "seconds")

0.26616740226745605 seconds

Same as one that writes the result inside the loop instead of in the object

np.quantile(ts[dtest.lukey1path], [0,0.1,0.5,0.75,1.0])

array([200000., 201002., 204999., 207499., 209999.])

np.quantile(ts[dtest.datapath], [0,0.1,0.25,0.5,0.75,1.0])

array([8653.75976562, 8744.85253906, 8881.49023438, 9108.51367188,

   9333.796875  , 9550.5390625 ])

now set the lookup type to stair step and see how the distributions changes

dtest.lutype1 = 2 iterate_write(dtest, ts, steps) np.quantile(ts[dtest.datapath], [0,0.1,0.25,0.5,0.75,1.0])

array([ 0.0, 0.0, 8890.59960938, 9301.5, 9301.5 ])

rburghol commented 1 year ago

Iterate without njit

def iterate_plain(oneobject, ts, steps):
    for step in range(steps):
        oneobject.class_type.

start = time.time()
iterate_plain(dtest, ts, steps)
end = time.time()
print(end - start, "seconds")
# 12.643825054168701 seconds
# way too slow!

tclass

instance.jitclass.XdataMatrix#21193008580<lutype1:int8,lutype2:int8,matrix:array(float32, 2d,A),value:float32,keyval:float32,valcol:int8,datapath:unicode_type,lukey1path:unicode_type,lukey2path:unicode_type>

d = Dict.empty(types.int64, tclass)



- This works. BUT - problem is, we have multiple classes, and there is no inheritance in jitclass, 
- so we are still kinda stuck with how to get all the objects into a function for efficient execution.
- Therefore may have to do arrays of data describing the objects and use functions or just use the jit classes to have concise code
rburghol commented 1 year ago

Testing multi-dimensional matrix and object

keyval1 = 200000 lutype1 = 1 # interp (0=exact, 2 = stairstep) keyval2 = 2 # lutype2 = 0 om_table_lookup(data_table, mx_type, ncols, keyval1, lutype1, keyval2, lutype2)

9108.46839647291

keyval2 = 1 om_table_lookup(data_table, mx_type, ncols, keyval1, lutype1, keyval2, lutype2)

240.53022287656268

keyval2 = 0 om_table_lookup(data_table, mx_type, ncols, keyval1, lutype1, keyval2, lutype2)

200000.0


- That works good. Now try a 2-d interpolation
- Ex:
  - Min release is dependent on storage remaining, and 

data_table = np.asarray([ [ 0.0, 5.0, 10.0], [10.0, 15.0, 20.0], [20.0, 25.0, 30.0], [30.0, 35.0, 40.0] ], dtype= "float32") keyval1 = 23.0 lutype1 = 1 # interp (0=exact, 2 = stairstep) keyval2 = 2 # lutype2 = 0

test out the code

get an interpolated list of all the columns for the given row key

row_vals = row_interp(data_table, ncols, keyval1, lutype1) row_keys = data_table[0].copy() new_table = [row_keys, row_vals]

om_table_lookup(data_table, mx_type, ncols, keyval1, lutype1, keyval2, lutype2)

- Great, now use the object:
- todo: tokenize, adding all columns as either variable reference or constants

set up a clean simulation

op_tokens, state_paths, state_ix, dict_ix, ts_ix = init_sim_dicts() ModelObject.op_tokens, ModelObject.state_paths, ModelObject.state_ix, ModelObject.dict_ix, ModelObject.ts_ix = (op_tokens, state_paths, state_ix, dict_ix, ts_ix) river = ModelObject('RCHRES_R001')

dm = DataMatrix('dm', river, data_table) dm.matrix

array([[ 0., 5., 10.],

[10., 15., 20.],

[20., 25., 30.],

[30., 35., 40.]], dtype=float32)

dm.matrix_tokens

array([50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61])

dm.matrix_values

array([[0., 0., 0.],

[0., 0., 0.],

[0., 0., 0.],

[0., 0., 0.]])

dm.add_op_tokens()

exec_tbl_values(op_tokens[dm.ix], state_ix, dict_ix)

rburghol commented 1 year ago

Now run the above with different lookup types:

debug first

test_model(op_tokens, state_ix, dict_ix, ts_ix, step)

test 1

step_model(op_tokens, state_ix, dict_ix, ts_ix, step)

run a sim

iterate_models(op_tokens, state_ix, dict_ix, ts_ix, steps)

add a data matric accessor

a 1.5d interpolated, like for the stage-storage-discharge table

dma = DataMatrixLookup('dma', river, dm.state_path, 3, 17.5, 1, 1, 1, 0.0) dma.add_op_tokens()

test

om_table_lookup(dict_ix[dma.ops[2]], dma.ops[3], dma.ops[5], state_ix[dma.ops[6]], dma.ops[7], state_ix[dma.ops[8]], dma.ops[9])

22.5

exec_tbl_eval(op_tokens, op_tokens[dma.ix], state_ix, dict_ix)

22.5

steps=4036524 start = time.time() num = iterate_models(op_tokens, state_ix, dict_ix, ts_ix, steps) end = time.time() print(end - start, "seconds")

0.57 seconds

state_ix[dma.ix]

22.5

a 2-d interpolated

dma = DataMatrixLookup('dma', river, dm.state_path, 2, 17.5, 1, 6.8, 1, 0.0) dma.add_op_tokens()

test

om_table_lookup(dict_ix[dma.ops[2]], dma.ops[3], dma.ops[5], state_ix[dma.ops[6]], dma.ops[7], state_ix[dma.ops[8]], dma.ops[9])

24.3

exec_tbl_eval(op_tokens, op_tokens[dma.ix], state_ix, dict_ix)

24.3

steps=4036524 start = time.time() num = iterate_models(op_tokens, state_ix, dict_ix, ts_ix, steps) end = time.time() print(end - start, "seconds") state_ix[dma.ix]

24.3