HARPgroup / HARParchive

This repo houses HARP code development items, resources, and intermediate work products.
1 stars 0 forks source link

Land model Post-Processing Work Flow (each step is 1 separate script/function) #261

Open glenncampagna opened 2 years ago

glenncampagna commented 2 years ago

Overview

2 new R scripts (1 for each main bullet) which accept command arguments are today's goal

Link to data model diagram (from #242): image

Misc

rburghol commented 2 years ago

@jdkleiner see above - I tried to lay out the script names and arguments that each of 2 post-processing component scripts should have here. I'd like to get your thoughts about this as a development strategy going forward.

glenncampagna commented 2 years ago

First version of working export script can be found here: https://github.com/HARPgroup/HARParchive/blob/master/HARP-2022-Summer/export_hsp_h5.R

jdkleiner commented 2 years ago

@rburghol I think what you've laid out here is sensible. Separating the h5->CSV and CSV->summary stats portions of the process into separate scripts makes the most sense. The analysts should be able to take what they learned this morning (posting model stats as properties) the next step in a hsp_pwater.R script

glenncampagna commented 2 years ago

Running the Export and Summary Scripts

Rscript [export_hsp_h5.R] [h5_file_path] [output_file_path] [data_source_table]

_[export_hsph5.R] = path to the export script _[h5_filepath] = path to the h5 file data is being retrieved from _[output_filepath] = path and name of csv being saved _[data_sourcetable] = path to table in h5 (/RESULTS/ PERLND_001/PWATER/table)

Rscript [hsp_pwater.R] [land_segment_name] [scenario_name][landuse] [pwater_file_path] [image_file_path]

_[hsppwater.R] = path to the summary script (HARParchive/HARP-2022-Summer/AutomatedScripts/hsppwater.r) [land_segmentname] = name of land segment (A51800) _[scenarioname] = (p532sova2021) [landuse] = land use prefix (for) [pwater_filepath] = path to csv file created using export script (/media/model/p532/out/land/p532sova_2021/pwater/forA51800pwater.csv) [image_filepath] = path to where generate graphs will be stored (/media/model/p532/out/land/p532sova_2021/images)

Note: the path to hsp_pwater.R file must also be supplied in the command line when running from home directory (HARParchive/HARP-2022-Summer/AutomatedScripts/hsp_pwater.r)

rburghol commented 2 years ago

@glenncampagna here is where the next steps are: Most excellent! Now, what happens when you browse in VA Hydro to the model in question, and click on the scenario results?

But I see this, which is good: image

Note: if you're having trouble with the first item above, see the file from yesterday's session cova_runoff.R And compare the path to the file in the terminal to the path to the file in the web address from item #3 above.

juliabruneau commented 2 years ago

Update:

  1. We changed the image property settings to: http://deq1.bse.vt.edu:81/p532/out/land/p532sova_2021/forA51800.fig.totalOut.png
    • Where the file path is /p532/out/land/p532sova_2021
    • The filenames were edited to be shorter: forA51800.fig.AGWS.png and forA51800.fig.totalOut.png

We have pushed these updated files (exports and images) into VAHydro to verify that they work. If @rburghol or @jdkleiner could remove all the files from the scenario, we can run the script again to just have the correct files remaining (there are a lot of duplicates now).

  1. We were able to attach a command into the first script to remove the original h5 file after creating the csv file out of it

We are looking into running the script with the full path now.

glenncampagna commented 2 years ago

Error when running hsp_pwater.R with full path

@rburghol When attempting to run the summary script from our home directory and using a full path, with a directory and file name, we've gotten an error:

$ Rscript hsp_pwater.R A51800 p532sova_2021 /media/model/p6/out/land/hsp2_2022/eos/pwater_test.csv 'for'
Fatal error: cannot create 'R_TempDir'

Yes, it is a correct assumption to be running the script with a full path from our home directory on the deg server?

This error was solved, and was the result of limited disk space it seems.

gcambridge commented 2 years ago

Image generation batch script for all land segments in a basin.

I started working on the batch script to generate plots for all land segments in a basin, and had a few questions about the arguements.

I was able to simplify all the arguments used in the two R scripts down to three components (Land use, Land segment name, and scenario name).

image

@gcambridge : thanks for laying this out so clearly -- this really helps me. I think we covered this all in the meeting, with the exception of no.2: hard coding the data_source_table which I answer below.

However I was unclear on a couple things:

  1. How to source the land use for each land segment
  2. Can I hard-code the data_source_table in? (i.e /RESULTS/PERLND_P001/PWATER/table)
    • RWB: Definitely we should not hard code that, since we want this script to be reusable for the river channel simulation table, and any other data sources that we might decide we want to export later on.
  3. Should the building of the different arguments for the R scripts happen in the Batch Script?
rburghol commented 2 years ago

Everyone: With regard to the duplicates. I believe the problem is that it could be one (or more) of 3 causes:

  1. You are not sending a 3rd argument "TRUE" to the initial RomProperty$new() command. This means it does not query VAHydro to see if this property already exists. Thus, when you save it, it is creating a new one. See code below for a fix.
  2. You are setting propvalue or propcode in the initial RomProperty$new() command. Maybe this is causing a problem? It is not done like that in some of our other working code, so, for safety's sake, so we should query without value, then set the propcode/propvalue, then save(). See code below.
  3. You do not set the "bundle" property in your initial new() call. Best to set to 'dh_properties' (see code below).

Note, none the above should cause duplicate creation, but I think for some reason it DOES. If you can test this and confirm/disconfirm that this fixes the duplicate issue, then that is great. If it does fix it, we need to file a bug report in hydrotools, which I would appreciate if someone would do for us when you finish testing. Hydrotools issue queue: https://github.com/HARPgroup/hydro-tools/issues Also note, there shpould be no reason to save() the model element every time, since the model should exist. What you should do in his case is load it (using new(), I know, weird syntax), and then see if it has a pid > 0, if so, no need to save it. It will just cur down on processing time, and since we will be doing this thousands and thousands of times that will add up.


model_constant_agwo_Runit <- RomProperty$new(
  ds, list(
    varkey="om_class_Constant",
    featureid=model_scenario$pid,
    entity_type='dh_properties',
    propname = 'l90_agwo_Runit',
    bundle = "dh_properties"
  ),
  TRUE
)

model_constant_agwo_Runit$propcode <- met.propcode
model_constant_agwo_Runit$propvalue <- as.numeric(l90_agwo_Runit)
model_constant_agwo_Runit$save(TRUE)

Progress:

gcambridge commented 2 years ago

@rburghol @jdkleiner we are testing the batch script and are running into the following permission denied errors when attempting to create the export directories:

mkdir: cannot create directory ‘/media/model/p532/out/land/hsp2_2022/eos’: Permission denied mkdir: cannot create directory ‘/media/model/p532/out/land/hsp2_2022/images’: Permission denied

Is there a way for you all to give us permissions to create and add files to these directories, or should we try exporting somewhere else for now?

glenncampagna commented 2 years ago

Data harvesting batch script created and tested for OR1_7700_7980

https://github.com/HARPgroup/HARParchive/blob/master/HARP-2022-Summer/batch_harvest.bat This batch script takes arguments for scenario and river segment and exports all pwater and iwater data tables to the same directory Test: /opt/model/p53/p532c-sova$ bash ~/batch_harvest.bat hsp2_2022 OR1_7700_7980 Output:

/media/model/p532/out/land/hsp2_2022/pwater$ ls
afoA51011_iwater.csv  cfoA51011_iwater.csv  homA51011_pwater.csv  hywA51011_pwater.csv  nhiA51011_pwater.csv  nloA51011_pwater.csv  rcnA51011_pwater.csv  trpA51011_pwater.csv
afoA51037_iwater.csv  cfoA51037_iwater.csv  homA51037_pwater.csv  hywA51037_pwater.csv  nhiA51037_pwater.csv  nloA51037_pwater.csv  rcnA51037_pwater.csv  trpA51037_pwater.csv
alfA51011_pwater.csv  cidA51011_iwater.csv  hvfA51011_pwater.csv  lwmA51011_pwater.csv  nhoA51011_pwater.csv  npaA51011_pwater.csv  rexA51011_pwater.csv  ursA51011_pwater.csv
alfA51037_pwater.csv  cidA51037_iwater.csv  hvfA51037_pwater.csv  lwmA51037_pwater.csv  nhoA51037_pwater.csv  npaA51037_pwater.csv  rexA51037_pwater.csv  ursA51037_pwater.csv
ccnA51011_pwater.csv  cpdA51011_pwater.csv  hwmA51011_pwater.csv  nalA51011_pwater.csv  nhyA51011_pwater.csv  npdA51011_pwater.csv  ridA51011_iwater.csv
ccnA51037_pwater.csv  cpdA51037_pwater.csv  hwmA51037_pwater.csv  nalA51037_pwater.csv  nhyA51037_pwater.csv  npdA51037_pwater.csv  ridA51037_iwater.csv
cexA51011_pwater.csv  forA51011_pwater.csv  hyoA51011_pwater.csv  nexA51011_pwater.csv  nidA51011_iwater.csv  pasA51011_pwater.csv  rpdA51011_pwater.csv
cexA51037_pwater.csv  forA51037_pwater.csv  hyoA51037_pwater.csv  nexA51037_pwater.csv  nidA51037_iwater.csv  pasA51037_pwater.csv  rpdA51037_pwater.csv

It is confirmed that all 'iwater' tables came from land uses classified as impervious by this model version, and both the pwater and water csvs were populated with the target data

juliabruneau commented 2 years ago

When trying to run the hsp2 model for the basin: OR4_8120_7890, me and @glenncampagna received errors. The directory used: /opt/model/p53/p532c-sova/ Command used: HSP_VERSION=hsp2;export HSP_VERSION;cbp run_land.csh hsp2_2022 OR4_8120_7890

My error:

Unable to open/create file 'rcnA51770.h5'
Traceback (most recent call last):
  File "/usr/local/bin/hsp2", line 11, in <module>
    load_entry_point('HSPsquared', 'console_scripts', 'hsp2')()
  File "/opt/model/HSPsquared/HSP2tools/HSP2_CLI.py", line 60, in main
    mando.main()
  File "/usr/local/lib/python3.8/dist-packages/mando/core.py", line 208, in __call__
    return self.execute(sys.argv[1:])
  File "/usr/local/lib/python3.8/dist-packages/mando/core.py", line 204, in execute
    return command(*a)
  File "/opt/model/HSPsquared/HSP2tools/HSP2_CLI.py", line 24, in run
    hdf5_instance = HDF5(hdfname)
  File "/opt/model/HSPsquared/HSP2IO/hdf.py", line 13, in __init__
    self._store = pd.HDFStore(file_path)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/pytables.py", line 561, in __init__
    self.open(mode=mode, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/pytables.py", line 710, in open
    self._handle = tables.open_file(self._path, self._mode, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tables/file.py", line 315, in open_file
    return File(filename, mode, title, root_uep, filters, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tables/file.py", line 778, in __init__
    self._g_new(filename, mode, **params)
  File "tables/hdf5extension.pyx", line 492, in tables.hdf5extension.File._g_new
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5F.c", line 509, in H5Fopen
    unable to open file
  File "H5Fint.c", line 1400, in H5F__open
    unable to open file
  File "H5Fint.c", line 1817, in H5F_open
    problems closing file
  File "H5Fint.c", line 1279, in H5F__dest
    problems closing file
  File "H5Faccum.c", line 1070, in H5F__accum_reset
    can't flush metadata accumulator
  File "H5Faccum.c", line 1033, in H5F__accum_flush
    file write failed
  File "H5FDint.c", line 258, in H5FD_write
    driver write request failed
  File "H5FDsec2.c", line 811, in H5FD_sec2_write
    file write failed: time = Wed Jul 27 13:23:47 2022
, filename = 'rcnA51770.h5', file descriptor = 3, errno = 28, error message = 'No space left on device', buf = 0x3688118, total write size = 800, bytes this sub-write = 800, bytes actually written = 18446744073709551615, offset = 0
  File "H5Fint.c", line 1783, in H5F_open
    unable to flush superblock
  File "H5Fio.c", line 198, in H5F_flush_tagged_metadata
    can't reset accumulator
  File "H5Faccum.c", line 1070, in H5F__accum_reset
    can't flush metadata accumulator
  File "H5Faccum.c", line 1033, in H5F__accum_flush
    file write failed
  File "H5FDint.c", line 258, in H5FD_write
    driver write request failed
  File "H5FDsec2.c", line 811, in H5FD_sec2_write
    file write failed: time = Wed Jul 27 13:23:47 2022
, filename = 'rcnA51770.h5', file descriptor = 3, errno = 28, error message = 'No space left on device', buf = 0x377bd38, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0

This is just one of the multiple similar errors when trying to run it.

glenncampagna commented 2 years ago

Questions about running model for OR4_8120_7890 @rburghol

 - We see that the model is trying to run our export R script but is having an error, maybe because of the _pwater export path and iwater data source table? afo is an impervious land use so it should have an iwater table..
 - Lastly, using the rhdf5 package we see that some h5 files from the river see have been successfully generated and contain a results group, others are generated but do not contain the results group which we learned means the model hasn't been run yet, and other h5 files seem completely empty. Does this mean the model maybe has only been partially run for this river seg?
 Note: h5 files located where we'd expect them to be from the model run led us to believe the model might've already been run all the way through:

/opt/model/p53/p532c-sova/output/hspf/land/out/for/hsp2_2022/forA51019.h5 /opt/model/p53/p532c-sova/output/hspf/land/out/cid/hsp2_2022/cidA51019.h5 /opt/model/p53/p532c-sova/output/hspf/land/out/hyo/hsp2_2022/hyoA51019.h5 /opt/model/p53/p532c-sova/output/hspf/land/out/rpd/hsp2_2022/rpdA51019.h5 /opt/model/p53/p532c-sova/output/hspf/land/out/hyw/hsp2_2022/hywA51019.h5 /opt/model/p53/p532c-sova/output/hspf/land/out/nex/hsp2_2022/nexA51019.h5

through

/opt/model/p53/p532c-sova/output/hspf/land/out/afo/hsp2_2022/afoB51023.h5 /opt/model/p53/p532c-sova/output/hspf/land/out/rex/hsp2_2022/rexB51023.h5 /opt/model/p53/p532c-sova/output/hspf/land/out/alf/hsp2_2022/alfB51023.h5 /opt/model/p53/p532c-sova/output/hspf/land/out/lwm/hsp2_2022/lwmB51023.h5


A51019 is the first land segment listed within the river seg, and B51023 is the last (15 total)