USACE / cloudcompute

2 stars 1 forks source link

create a python plugin #19

Closed HenryGeorgist closed 2 months ago

HenryGeorgist commented 2 years ago

The Python Plugin should be a simple calculation that summarizes a hydrograph into a series of optional statistics. The optional statistics are:

  1. Average
  2. Max
  3. Min
  4. Duration Maximum (e.g. the maximum of the 3 hour average)

This link gives some context into how we currently compute the duration maximum from a dss file, only for reference. Duration Maximum

thwllms commented 2 years ago

@HenryGeorgist @slawler just want to make sure I'm on the same page as you all.

Deliverable: a Docker-ized Python CLI application that, given a hydrograph, computes some basic statistics (avg, max, min, duration max, etc.)

Example output:

{
  "max": 123.4,
  "min": 12.3,
  "avg": 23.4,
  "duration_max": 21.0,
  "duration": "3hr",
}

Is there anything WAT-specific that needs to be addressed?

HenryGeorgist commented 2 years ago

@thwllms yeah thats about right, I started a java example here: https://github.com/HenryGeorgist/JavaPlugin

It is not super helpful at this point, it leaves alot of details out.

we need to give you a good model payload to start with.

HenryGeorgist commented 2 years ago

@thwllms do you want to chat today to go over the needs for this plugin? I would happily help out if you have time.

thwllms commented 2 years ago

@HenryGeorgist sure, I can chat today. I'm working up something here: https://github.com/water-tech-repos/wat-hydrograph-stats-py. The hydrograph_stats.py script does just about everything we discussed above. Need to add a Dockerfile and tests. What do you think?

HenryGeorgist commented 2 years ago

cool - lets talk. i am free except 11 am est

HenryGeorgist commented 2 years ago

https://github.com/HenryGeorgist/HydrographScaler/blob/main/docker-compose.yml

this compose file may help in mocking s3 so you can test retrieval of the csv file from an s3 bucket.

thwllms commented 2 years ago

cool - lets talk. i am free except 11 am est

Did you get my Teams meeting invite for 3pm Eastern today?

HenryGeorgist commented 2 years ago

yes, i delined it, i have a train to catch at 3:30 can we do a bit earlier?


From: Thomas Williams @.> Sent: Tuesday, April 19, 2022 12:08 PM To: USACE/wat-api @.> Cc: Lehman, William P CIV USARMY CEIWR (USA) @.>; Mention @.> Subject: [URL Verdict: Neutral][Non-DoD Source] Re: [USACE/wat-api] create a python plugin (Issue USACE/cloudcompute#19)

cool - lets talk. i am free except 11 am est

Did you get my Teams meeting invite for 3pm Eastern today?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

thwllms commented 2 years ago

Oh, I didn't get the declined message. 1:30pm?

HenryGeorgist commented 2 years ago

modelPayload.yml

target_plugin: hydrograph_stats
model_configuration:
  model_name: stats
  model_configuration_paths:
  - /data/hydrographstats/stats.json
model_links:
  linked_inputs:
  - input:
      - name: hydrograph
      - parameter: flow
      - format: .csv
    source:
      - name: /data/hydrographscaler/output/hsm1.csv
      - parameter: flow
      - format: .csv
  required_outputs:
  - name: summaryStatOutput
    parameter: flow
    format: .csv
event_config:
  output_destination: /data/hydrographstats/output/
  realization:
    index: 1
    seed: 1234
  event:
    index: 1
    seed: 5678
  time_window:
    starttime: 2018-01-01T01:01:01.000000001-05:00
    endtime: 2020-12-31T01:01:01.000000001-05:00
HenryGeorgist commented 2 years ago

hsm1.csv


Time,Flow
2018-01-01 01:01:01.000000001 -0500 -0500,6.074918237944704
2018-01-01 02:01:01.000000001 -0500 -0500,6.301664797370323
2018-01-01 03:01:01.000000001 -0500 -0500,6.51896358348654
2018-01-01 04:01:01.000000001 -0500 -0500,6.745710142912159
2018-01-01 05:01:01.000000001 -0500 -0500,6.972456702337778
2018-01-01 06:01:01.000000001 -0500 -0500,7.199203261763397
2018-01-01 07:01:01.000000001 -0500 -0500,7.4259498211890165
2018-01-01 08:01:01.000000001 -0500 -0500,7.6526963806146355
2018-01-01 09:01:01.000000001 -0500 -0500,7.8699951667308525
2018-01-01 10:01:01.000000001 -0500 -0500,8.096741726156472
2018-01-01 11:01:01.000000001 -0500 -0500,8.32348828558209
2018-01-01 12:01:01.000000001 -0500 -0500,8.55023484500771
2018-01-01 13:01:01.000000001 -0500 -0500,8.776981404433329
2018-01-01 14:01:01.000000001 -0500 -0500,8.994280190549546
2018-01-01 15:01:01.000000001 -0500 -0500,9.221026749975165
2018-01-01 16:01:01.000000001 -0500 -0500,9.447773309400784
2018-01-01 17:01:01.000000001 -0500 -0500,9.447773309400784
2018-01-01 18:01:01.000000001 -0500 -0500,9.447773309400784
2018-01-01 19:01:01.000000001 -0500 -0500,9.447773309400784
2018-01-01 20:01:01.000000001 -0500 -0500,9.447773309400784
2018-01-01 21:01:01.000000001 -0500 -0500,9.447773309400784
2018-01-01 22:01:01.000000001 -0500 -0500,9.221026749975165
2018-01-01 23:01:01.000000001 -0500 -0500,8.994280190549546
2018-01-02 00:01:01.000000001 -0500 -0500,8.776981404433329
2018-01-02 01:01:01.000000001 -0500 -0500,8.55023484500771
2018-01-02 02:01:01.000000001 -0500 -0500,8.32348828558209
2018-01-02 03:01:01.000000001 -0500 -0500,8.096741726156472
2018-01-02 04:01:01.000000001 -0500 -0500,7.8699951667308525
2018-01-02 05:01:01.000000001 -0500 -0500,7.6526963806146355
2018-01-02 06:01:01.000000001 -0500 -0500,7.4259498211890165
2018-01-02 07:01:01.000000001 -0500 -0500,7.199203261763397
2018-01-02 08:01:01.000000001 -0500 -0500,6.972456702337778
2018-01-02 09:01:01.000000001 -0500 -0500,6.745710142912159
2018-01-02 10:01:01.000000001 -0500 -0500,6.556754676724143
2018-01-02 11:01:01.000000001 -0500 -0500,6.37724698384553
2018-01-02 12:01:01.000000001 -0500 -0500,6.188291517657514
2018-01-02 13:01:01.000000001 -0500 -0500,5.9993360514694976
2018-01-02 14:01:01.000000001 -0500 -0500,5.810380585281482
2018-01-02 15:01:01.000000001 -0500 -0500,5.621425119093466
2018-01-02 16:01:01.000000001 -0500 -0500,5.43246965290545
2018-01-02 17:01:01.000000001 -0500 -0500,5.252961960026837
2018-01-02 18:01:01.000000001 -0500 -0500,5.0640064938388205
2018-01-02 19:01:01.000000001 -0500 -0500,4.875051027650804
2018-01-02 20:01:01.000000001 -0500 -0500,4.686095561462789
2018-01-02 21:01:01.000000001 -0500 -0500,4.497140095274773
2018-01-02 22:01:01.000000001 -0500 -0500,4.3459757223243605
2018-01-02 23:01:01.000000001 -0500 -0500,4.194811349373948
2018-01-03 00:01:01.000000001 -0500 -0500,4.053094749732936
2018-01-03 01:01:01.000000001 -0500 -0500,3.9019303767825235
2018-01-03 02:01:01.000000001 -0500 -0500,3.7507660038321116
2018-01-03 03:01:01.000000001 -0500 -0500,3.5996016308816987
2018-01-03 04:01:01.000000001 -0500 -0500,3.448437257931286
2018-01-03 05:01:01.000000001 -0500 -0500,3.2972728849808735
2018-01-03 06:01:01.000000001 -0500 -0500,3.146108512030461
2018-01-03 07:01:01.000000001 -0500 -0500,2.9949441390800486
2018-01-03 08:01:01.000000001 -0500 -0500,2.8532275394390365
2018-01-03 09:01:01.000000001 -0500 -0500,2.702063166488624
2018-01-03 10:01:01.000000001 -0500 -0500,2.64537652663222
2018-01-03 11:01:01.000000001 -0500 -0500,2.588689886775815
2018-01-03 12:01:01.000000001 -0500 -0500,2.5320032469194103
2018-01-03 13:01:01.000000001 -0500 -0500,2.4753166070630055
2018-01-03 14:01:01.000000001 -0500 -0500,2.4186299672066007
2018-01-03 15:01:01.000000001 -0500 -0500,2.361943327350196
2018-01-03 16:01:01.000000001 -0500 -0500,2.305256687493791
2018-01-03 17:01:01.000000001 -0500 -0500,2.2485700476373864
HenryGeorgist commented 2 years ago

@thwllms let me know if that helps with the file format and the model payload example

thwllms commented 2 years ago

@HenryGeorgist thanks. A few questions:

HenryGeorgist commented 2 years ago
thwllms commented 2 years ago

@HenryGeorgist I've added integration tests, using Docker to test S3 and Azure Blob Storage connections with minio and azurite. Made some small tweaks to the WAT payload format above. To run the tests: ./integration-tests.sh -- I don't think any setup should be required. Let me know what you think.

https://github.com/water-tech-repos/wat-hydrograph-stats-py

PS: please forgive the monolithic .py file. Proof-of-concept... right? 😬

HenryGeorgist commented 2 years ago

i wonder if we should start thinking more like uri instead of url

thwllms commented 2 years ago

@HenryGeorgist by that do you mean some service between S3/redis/etc. and the plugin container which provides e.g. hydrographs to the plugin in a standardized way? That is, so plugins wouldn't worry about precisely where input resources are coming from.

HenryGeorgist commented 2 years ago

well i was thinking that you prepend s3:\ to the path in the aws config, and abfs:\ to the azure one - if we separate the storage type from the location - it might benefit us

HenryGeorgist commented 2 years ago

not that your interpretation of my comment isnt good - just not what I intended to say.

thwllms commented 2 years ago

No worries. Is this the sort of thing you mean?

target_plugin: hydrograph_stats
model_configuration:
  model_name: stats
  model_configuration_paths:
  - type: s3
    bucket: mybucket
    key: config_aws.yml
model_links:
  linked_inputs:
  - name: hydrograph
    source:
      type: azblob
      container: mycontainer
      blob: hsm1.csv
    parameter: flow
    format: .csv
  required_outputs:
  - name: summaryStatOutput
    parameter: flow
    format: .json
event_config:
  output_destination:
    type: redis
    host: some.redis.host
    key: task12345
  realization:
    index: 1
    seed: 1234
  event:
    index: 1
    seed: 5678
  time_window:
    starttime: 2018-01-01T01:01:01.000000001-05:00
    endtime: 2020-12-31T01:01:01.000000001-05:00
HenryGeorgist commented 2 years ago

yeah actually maybe something like that, that as you point out would fit for model configurations and for links

HenryGeorgist commented 2 years ago

https://pkg.go.dev/go.lsp.dev/uri

im thinking about "scheme", "authority", "path", "query", and "fragment" or something like that

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
   |           |            |            |        |
scheme     authority       path        query   fragment
   |   _____________________|__
  / \ /                        \
  urn:example:animal:ferret:nose
thwllms commented 2 years ago

Gotcha. I know that type of URI is pretty standard for S3/etc (s3://bucket/thing.txt) but I don't know if there's a similar established way of doing that for Redis or SQS? Not that it would be too hard to create something.

Edit: to be clear, there's a URI scheme for Redis databases but I don't believe that it permits referring to a specific key within the database in the same way. https://www.iana.org/assignments/uri-schemes/prov/redis

Edit 2: nevermind, I see this can be done for Redis pretty simply with the fragment portion.

thwllms commented 2 years ago

Added Redis as a hydrograph source / results sink. https://github.com/water-tech-repos/wat-hydrograph-stats-py/pull/11

Looking into SQS next.

HenryGeorgist commented 2 years ago

In your switch case you are hardcoding the cases - we may need to push that to env variables ultimately because our mock system schemes may be different (using elastic instead of sqs for instance)

Take a look at the wat api docker compose, i think i have set up services in a single yml for our testing - sorry i didnt share this earlier https://github.com/USACE/wat-api/blob/main/docker-compose.yml

thwllms commented 2 years ago

@HenryGeorgist meaning something like this for the env variables?

def get_text(uri: str, fsspec_kwargs: dict = {}) -> str:
    uri_parsed = urlparse(uri)
    scheme = uri_parsed.scheme
    if scheme == os.getenv('URI_SCHEME_REDIS', 'redis') or scheme == os.getenv('URI_SCHEME_REDISS', 'rediss'):
        r = Redis.from_url(uri, decode_responses=True)
        key = uri_parsed.fragment
        text = r.get(key)
    elif scheme == os.getenv('URI_SCHEME_HTTP', 'http') or scheme == os.getenv('URI_SCHEME_HTTPS', 'https'):
        text = requests.get(uri).text
    else:
        with fsspec.open(uri, 'r', **fsspec_kwargs) as f:
            text = f.read()
    return str(text)
HenryGeorgist commented 2 years ago

it isnt robust yet, but i was able to do this as a message today:


target_plugin: hydrograph_stats
plugin_image_and_tag: tbd/hydrographstats:v0.0.2
model_configuration:
  model_name: hydrograph_stats
  model_configuration_paths:
  - /data/config_aws.yml
model_links:
  linked_inputs:
  - name: hsm.csv
    parameter: flow
    format: csv
    resource_info:
      scheme: s3?
      authority: /data/realization_0/event_1
      fragment: hsm.csv
  required_outputs:
  - name: results-wat.json
    parameter: scalar
    format: json
event_config:
  output_destination: /data/realization_0/event_8
  realization:
    index: 0
    seed: 4494286321627776427
  event:
    index: 8
    seed: 3276075611334443242
  time_window:
    starttime: 2018-01-01T01:01:01.000000001Z
    endtime: 2020-12-31T01:01:01.000000001Z
HenryGeorgist commented 2 years ago

shoot -it looks like my output destination and my input authority are not in sync yet. i will get that fixed.

HenryGeorgist commented 2 years ago

fixed it...


target_plugin: hydrograph_stats
plugin_image_and_tag: tbd/hydrographstats:v0.0.2
model_configuration:
  model_name: hydrograph_stats
  model_configuration_paths:
  - /data/config_aws.yml
model_links:
  linked_inputs:
  - name: hsm.csv
    parameter: flow
    format: csv
    resource_info:
      scheme: how do i figure this out
      authority: /data/realization_0/event_5
      fragment: hsm1.csv
  required_outputs:
  - name: results-wat.json
    parameter: scalar
    format: json
event_config:
  output_destination: /data/realization_0/event_5
  realization:
    index: 0
    seed: 4494286321627776427
  event:
    index: 5
    seed: 2830258753914485572
  time_window:
    starttime: 2018-01-01T01:01:01.000000001Z
    endtime: 2020-12-31T01:01:01.000000001Z
HenryGeorgist commented 2 years ago

i figured out a way to pass the fs config all the way down... not feeling awesome about it... but it works


target_plugin: hydrograph_stats
plugin_image_and_tag: tbd/hydrographstats:v0.0.2
model_configuration:
  model_name: hydrograph_stats
  model_configuration_paths:
  - /data/config_aws.yml
model_links:
  linked_inputs:
  - name: hsm.csv
    parameter: flow
    format: csv
    resource_info:
      scheme: minio:9000/configs
      authority: /data/realization_0/event_7
      fragment: hsm1.csv
  required_outputs:
  - name: results-wat.json
    parameter: scalar
    format: json
event_config:
  output_destination: /data/realization_0/event_7
  realization:
    index: 0
    seed: 4494286321627776427
  event:
    index: 7
    seed: 5559254042425429666
  time_window:
    starttime: 2018-01-01T01:01:01.000000001Z
    endtime: 2020-12-31T01:01:01.000000001Z
thwllms commented 2 years ago

@HenryGeorgist are /data/config_aws.yml and /data/realization_0/... meant to be in an S3 bucket called data?

And per your email last week, output should be sorted in a redis key named like this? image

tbd/hydrographstats:v0.0.2_wat-payload.yml_R0_E7 (?)

HenryGeorgist commented 2 years ago

data/ is actually more like a postfix. the bucket should come in on the environment variables (i think my examples actually use /configs as the bucket name.

We are iterating on how to manage the task execution with a plugin container, and it seems we are now migrating away from lambda and towards batch. With batch we get some status reporting natively on the batch job, which may make the redis status cache less valueable (i am not certain though)

thwllms commented 2 years ago

@HenryGeorgist I've made some updates to handle the YAML spec you posted above and to write status to a Redis key. Check out this integration test: https://github.com/water-tech-repos/wat-hydrograph-stats-py/blob/main/tests/new_aws_integration_test.py#L119

A little messier than I'd like, but hopefully this can integrate with what you've written so far. Let me know what you think.

slawler commented 2 years ago

Just took a look at this test and noticed the floating point comparison, which has bitten me in the past. @thwllms you might consider the pytest.approx function just to be extra safe:

https://github.com/pytest-dev/pytest/blob/69fb79e741f00714d3ac14ee853c5506f154e94f/src/_pytest/python_api.py#L516

thwllms commented 2 years ago

@slawler thanks for pointing that out. Updated the tests to use pytest.approx.