Model Data File Structure

rburghol commented 5 years ago

We are exporting WDM files for edge of stream (eos), and stream outflow data amongst other things. Thus far, we have stored the stream flow data in the same directory as the stream WDMs, however, I think we need a different location in order to better track all of our data. I propose that we mimic the same directory structure used in the tmp/wdm directories for our out/ directory.

out/land/SCENARIO
out/river/SCENARIO

So, in the p6_gb604 model directory for the 2 scenarios we currently have there will be:

out/land/CFBASE30Y20180615
out/land/CBASE1808L55CY55R45P50R45P50Y
out/river/CFBASE30Y20180615
out/river/CBASE1808L55CY55R45P50R45P50Y

File name convention for a WDM export, might be:

[landseg]_[DSN] or DSN text abbrev
ex: N51045_0111 or N51045_suro
[riverseg]_[DSN] or DSN text abbrev
JU3_7490_7400_444 or JU3_7490_7400_divr

hdaniel7 commented 5 years ago

I created a function to generate csv files for all wdm files within a given directory and output them in a specified directory -- for example, the lines below generated surface runoff csv files for all 300 or so river segments we have wdms for and moved them into the /out/river/CFBASE30Y20180615 directory.

R

source('~/cbp6/code/fn.wdm.to.csv.all.R')

wdm.to.csv.all(wdmpath = '/opt/model/p6/p6_gb604/tmp/wdm/river/CFBASE30Y20180615/stream/', outputdir = '/opt/model/p6/p6_gb604/out/river/CFBASE30Y20180615', 1984, 2014, 111)

I now realize that to fully imitate the directory structure of /tmp/wdm/ I should have specified the output directory as /out/river/CFBASE30Y20180615/stream/ -- so I'll throw together a quick script to allow us to move all csvs within a specified directory to a different directory, which should also come in handy as we move around csvs we've generated in the past to their new places in this new directory structure.

hdaniel7 commented 5 years ago

The function to move all csv files from one directory to another is now complete -- here's the commands I used to move all 317 csvs in the /out/river/CFBASE30Y20180615/ directory to /out/river/CFBASE30Y20180615/stream/:

R

source('~/cbp6/code/fn.csv.move.all.R')

csv.move.all(csvpath = '/opt/model/p6/p6_gb604/out/river/CFBASE30Y20180615', outputdir = '/opt/model/p6/p6_gb604/out/river/CFBASE30Y20180615/stream')

We'll start moving all the generated csvs to the new directory structure now.

hdaniel7 commented 5 years ago

We agree that mimicking the existing /tmp/wdm/ directory structure is the best choice -- however, in /tmp/out/, the land directory currently has the 30 land uses as directories and then the scenario directories within (i.e. /tmp/out/land/aop/CFBASE30Y20180615) -- we have updated the /out/ directory to reflect this organizational structure.

We also agree on the naming convention with the one small that for the land use land segment information, we propose that keeping the file name as [land_use][landsegment][DSN] is the most helpful (by keeping the file contents identifiable) as well as easiest (would require no change from the current naming convention) option.

rburghol commented 5 years ago

I see what you're saying about the landuse structure in /tmp, and that makes sense in the case of the runoff where we have the individual landuse eos exports, but we also have the merged eos file. Where would those files go?

hdaniel7 commented 5 years ago

Good point -- I had forgotten about those merged files. One option would be to create an additional subdirectory within the /land/ directory called "/all" or "/merged" and store them there, within their respective /[scenario] subdirectories. Or, perhaps we should reverse the order of the directories to how you had them (i.e. /out/land/[scenario]/[land use]) and either store the merged eos files in the overall /[scenario]/ directory or within an additional subdirectory labeled "/all" or "/merged". I don't really see any particular advantage to doing it one way or the other -- what do you think?

rburghol commented 5 years ago

I think a couple things:

Since these are eos inputs we should mirror the "eos" directory under "river" that contains ps/sep/div wdm files: ex: river/CFBASE30Y20180615/eos
For now, lets NOT save the individual landuse exports after we have merged them ?abd used the for QA!). At later time if we want to store the individual land use eos data we can discuss the best file structure. At this moment for some reason I prefer the land/[scenario] regardless of whether we have split file outputs (ie land/[scenario]/for) or for unified files (ie land/[scenario]/eos-a51115.csv) because it clashes with my sense of appropriate hierarchy (but I can live with it if everyone else feels strongly)

hdaniel7 commented 5 years ago

Alright. The combined land use exports can be stored within a land/[scenario]/eos directory.

Currently there is one function used to generate all the land use csv files by running quick wdm to csv and a second function that imports all of these generated csv files to create the one merged land use file -- perhaps I should add in some code to delete the individual land use .csvs as soon as soon as the merged land use file is generated?

We agree with you that it makes more sense to have [scenario] directories immediately within the land directory, as opposed to splitting them into [land use]/[scenario]. The only reason we set the directories up this way was to directly mimic the tmp/wdm/ structure -- but we might as well set it up in a more logical hierarchical structure while we're at it.

rburghol commented 5 years ago

I think the idea of having the code clean-up (delete) after is a good idea. We can include a parameter to enable or disable the cleanup if we like. It should default to TRUE. Normally I would advocate for keeping all the files by default, but this will end up being 200-400 Gigs of data and we will risk running out of drive space. Which sucks. :)

To be clear though, I am advocating that the combined exports be stored in out/river/[scenario]/eos not out/land/[scenario]/eos.

hdaniel7 commented 5 years ago

Oh, alright! I thought you were just suggesting we mimic the file structure of out/river/[scenario]/eos. These merged land use files pertain to land segments rather than river segments, which is the only argument I have for them being stored in the out/land directory -- however, it really does make more sense to just store them in river/eos rather than having an eos directory in both land and river.

While delving into the tmp/wdm/land directory, it appears that the only .csv files we have stored are these individual and merged land use suro, ifwo, and agwo files -- since the merged files will be stored in out/river/eos and the individual files will be deleted, we'll create land/[scenario] directories and leave them empty for future use.

rburghol commented 5 years ago

Sounds good. Just verify 2 things if you please:

These will go in out/river/[scenario]/eos -- so it is very much like the structure of ./tmp/wdm/river/[scenario[/eos/
When can I get some Craig Creek runoff files to put through their paces!?!?! :)

hdaniel7 commented 5 years ago

Current directory hierarchy is as follows for out/: out/land out/land/[scenario] out/river out/river/[scenario] out/river/[scenario]/eos out/river/[scenario]/stream

Files are named as: [river segment]_[dsn].csv or as: [land segment]_[dsn].csv

Files with multiple dsns, such as the land use export files which contain ifwo, agwo, and suro data for a river segment are named as: [land segment]_[dsn1],[dsn2],[dsn3].csv separated by commas rather than underscores to emphasize that there are multiple dsns rather than looking like an oddly named land-river segment (i.e. A51121_0111,0211,0411.csv rather than A51121_0111_0211_0411.csv)

Is everybody happy with this setup?

hdaniel7 commented 5 years ago

Just saw that you replied before I finished typing up my comment -- 1) Yes, the merged land use files are stored in river/[scenario]/eos -- the test one currently generated and in location is land segment A51121 for phase p532cal_062211. 2) The Craig Creek runoff files slipped my mind! I'll update my script to clean up after itself and then should have those merged runoff files for you in place and named properly by tomorrow afternoon!

rburghol commented 5 years ago

This looks splendid except the use of commas in a file name. I think we should do either "qunit", or underscores, with a dash to separate the land segment from the name, such as: A51121-0111_0211_0411.csv or A51121_qunit.csv

Also, I am now understanding your reluctance on the eos in the river directory, as opposed to the land directory -- since the names are going to be by landseg and have no mention of riverseg! So, to be totally maddening, I will throw this out there: We could always stash it in out/land/[scenario]/eos if it made more sense to do so. :)

Looking forward to those files! FWIW I will be off tomorrow. But will tear through this on Monday.

hdaniel7 commented 5 years ago

All the available Craig Creek segments now have their merged land use export files stored in their respective /out/land/[scenario]/eos directories -- including those for p532cal_062211, CFBASE30Y20180615, and CBASE1808L55CY55R45P50R45P50Y. That being said, I want to reiterate that p6 .csvs for H51071, H51121, and H51161 could not be generated since the .wdms for these land segments do not exist in the data we have been provided with.

The method we settled on for now of naming the files with multiple DSNs was to separate the land segment from the name with an underscore and to separate the DSNs with dashes -- i.e. A51121_0111-0211-0411.csv in order to maintain the consistency of underscore separation of segment and name of files with a single DSN, i.e. JA5_7480_0001_0111.csv. If you feel strongly that separating land segment and DSN with a dash is the best naming convention, it would be worth considering renaming other files to also separate segment from DSN with a dash for the sake of consistency (i.e. JA5_7480_0001-0111.csv).

hdaniel7 commented 5 years ago

I generated merged land use export files for the new Craig Creek land segments -- phase 6 segs (L51023, N51045, N51005, N51023, and H51045) are stored in /opt/model/p6/p6_gb604/out/land/CFBASE30Y20180615/eos, while phase 5 segs (A51023 and A51045) are stored in /opt/model/p53/p532c-sova/out/land/p532cal_062211/eos.

HARPgroup / cbp6

Model Data File Structure #72