HARPgroup / HARParchive

This repo houses HARP code development items, resources, and intermediate work products.
1 stars 0 forks source link

rhdf5 #207

Open rburghol opened 2 years ago

rburghol commented 2 years ago

Overview

Installation

install.packages("BiocManager")
BiocManager::install("rhdf5")
library("rhdf5")
dsn411 = h5read("forA51800.h5","/TIMESERIES/TS411/table")
glenncampagna commented 2 years ago

Exploring rhdf5 as a possible solution to finding timestamps in .h5 files

glenncampagna commented 2 years ago

Comparing h5read and h5ls commands:

h5ls("file", recursive = TRUE, all = FALSE, datasetinfo = TRUE, index_type, native = FALSE)

h5read("file", name, index = NULL, start = NULL, stride = NULL, block = NULL, count = NULL, compoundAsDataFrame = TRUE, callGeneric = TRUE, read.attributes = FALSE, drop = FALSE, native = FALSE, s3 = FALSE, s3credentials = NULL)

h5ls:

h5read:

juliabruneau commented 2 years ago

h5read redirects through its arguments to h5Dread, which reads a partial dataset from an HGF5 file. Could this help with the error above?

h5Dread:

@juliabruneau - good information/hunting. Perhaps this is due to a partial set? Though, I think that the memory error comes from an unspecific query. For example, if I keep knocking things off the end of the h5read path like so I get more and more warnings, presumably as the data retrieved gets larger, I just think maybe this error could mean there is too much data?:

Terminal Code 1: Output of h5dump -d OR1_7700_7980.h5 | grep TS1001

 group      /TIMESERIES/TS1001
 group      /TIMESERIES/TS1001/_i_table
 group      /TIMESERIES/TS1001/_i_table/index
 dataset    /TIMESERIES/TS1001/_i_table/index/abounds
 dataset    /TIMESERIES/TS1001/_i_table/index/bounds
 dataset    /TIMESERIES/TS1001/_i_table/index/indices
 dataset    /TIMESERIES/TS1001/_i_table/index/indicesLR
 dataset    /TIMESERIES/TS1001/_i_table/index/mbounds
 dataset    /TIMESERIES/TS1001/_i_table/index/mranges
 dataset    /TIMESERIES/TS1001/_i_table/index/ranges
 dataset    /TIMESERIES/TS1001/_i_table/index/sorted
 dataset    /TIMESERIES/TS1001/_i_table/index/sortedLR
 dataset    /TIMESERIES/TS1001/_i_table/index/zbounds
 group      /TIMESERIES/TS1001/_i_table/values
 dataset    /TIMESERIES/TS1001/_i_table/values/abounds
 dataset    /TIMESERIES/TS1001/_i_table/values/bounds
 dataset    /TIMESERIES/TS1001/_i_table/values/indices
 dataset    /TIMESERIES/TS1001/_i_table/values/indicesLR
 dataset    /TIMESERIES/TS1001/_i_table/values/mbounds
 dataset    /TIMESERIES/TS1001/_i_table/values/mranges
 dataset    /TIMESERIES/TS1001/_i_table/values/ranges
 dataset    /TIMESERIES/TS1001/_i_table/values/sorted
 dataset    /TIMESERIES/TS1001/_i_table/values/sortedLR
 dataset    /TIMESERIES/TS1001/_i_table/values/zbounds
 dataset    /TIMESERIES/TS1001/table
rburghol commented 2 years ago

Hey all - see below which is excerpted from the test cases that we worked on yesterday (see also #211). This one gets us data that we want and gives clues as to where to look for other data (hint: maybe not TIMESERIES)

rchres_data = h5read("OR1_7700_7980.h5", "/RESULTS/RCHRES_R001/HYDR/table")
names(rchres_data )
quantile(rchres_data$ROVOL)
juliabruneau commented 2 years ago

HDFView 3.1.4

We can explore the .h5 files with an application called HDFView . It is used to specifically open .hdf5/.h5 files, and it provides a directory to look into the different groups and attributes within the .h5 file. The only "limitation" is that you have to register to this website in order to download the application, but it only asks for your email and what organization you're apart of (academic research).

This is the process to access the files in HDFView:

  1. Download the application: https://www.hdfgroup.org/downloads/hdfview/?1656346198

    • Download the .zip file: 'HDFView-3.1.4-win10_64-vs16.zip'
    • Extract the .zip file with something like 7-zip
  2. Download the .h5 file: http://deq1.bse.vt.edu:81/files/cbp/OR1_7700_7980.h5

    • Do this by right-clicking on the link, and then choosing: 'Save link as...'
    • If download discards, click up arrow and hit keep
  3. Click on the downloaded .h5 file to open it (this will open it automatically in the HDFViewer)

  4. Now you are able to see all the groups and different "layers" in our .h5 file

image

This Viewer provides more understanding on the contents of a hdf5 file, and it can hopefully help understand how we can extract the timestamp using R. Maybe we can utilize the Viewer's function to extract .txt files?

Update: Can save table as a .txt file to computer. Working on putting into R.

glenncampagna commented 2 years ago

Using H5Dread to get 64-bit timestamps:

fid = H5Fopen("OR1_7700_7980.h5")
did = H5Dopen(fid, "RESULTS/RCHRES_R001/HYDR/table")
H5Dread(did, bit64conversion= "double")
 index       DEP      IVOL O1 O2         O3 OVOL1 OVOL2      OVOL3
1    4.417668e+17 0.2415072  8.847264  0  0   2.055183     0     0  0.1229398
2    4.417704e+17 0.3002159  8.828485  0  0   3.175832     0     0  0.2161576
3    4.417740e+17 0.3486096  8.810900  0  0   4.282218     0     0  0.3081839
4    4.417776e+17 0.3905506  8.793962  0  0   5.374578     0     0  0.3990412
5    4.417812e+17 0.4279465  8.777404  0  0   6.453111     0     0  0.4887475
6    4.417848e+17 0.4619085  8.761090  0  0   7.517995     0     0  0.5773184
7    4.417884e+17 0.4931512  8.744946  0  0   8.569401     0     0  0.6647684
8    4.417920e+17 0.5221504  8.728931  0  0   9.606858     0     0  0.7510851
9    4.417956e+17 0.5492472  8.713009  0  0  10.629819     0     0  0.8362263
10   4.417992e+17 0.5747209  8.697166  0  0  11.638691     0     0  0.9201863

Don't forget to close the open data objects, both the file and dataset, when finished

H5Dclose(did)
H5Fclose(fid)

origin <- "1970-01-01" rchres1$index <- as.POSIXct((rchres1$index)/1000000000, origin = origin, tz="UTC")

head(rchres1) index DEP IVOL O1 O2 O3 OVOL1 OVOL2 OVOL3 1 1984-01-01 01:00:00 0.2415072 8.847264 0 0 2.055183 0 0 0.1229398 2 1984-01-01 02:00:00 0.3002159 8.828485 0 0 3.175832 0 0 0.2161576 3 1984-01-01 03:00:00 0.3486096 8.810900 0 0 4.282218 0 0 0.3081839 4 1984-01-01 04:00:00 0.3905506 8.793962 0 0 5.374578 0 0 0.3990412 5 1984-01-01 05:00:00 0.4279465 8.777404 0 0 6.453111 0 0 0.4887475 6 1984-01-01 06:00:00 0.4619085 8.761090 0 0 7.517995 0 0 0.5773184 PRSUPY RO ROVOL SAREA TAU USTAR VOL VOLEV 1 0 2.055183 0.1229398 67.47366 0.02344255 0.1099862 15.79433 0 2 0 3.175832 0.2161576 83.87602 0.02914127 0.1226281 24.40666 0 3 0 4.282218 0.3081839 97.39652 0.03383873 0.1321425 32.90938 0 4 0 5.374578 0.3990412 109.11423 0.03790982 0.1398658 41.30430 0 5 0 6.453111 0.4887475 119.56212 0.04153978 0.1464090 49.59296 0 6 0 7.517995 0.5773184 129.05061 0.04483640 0.1521076 57.77673 0


Note: We found that this table's last timestamp is 1984-09-02 02:00:00
megpritch commented 2 years ago

HDFView 3.1.4

We can explore the .h5 files with an application called HDFView . It is used to specifically open .hdf5/.h5 files, and it provides a directory to look into the different groups and attributes within the .h5 file. The only "limitation" is that you have to register to this website in order to download the application, but it only asks for your email and what organization you're apart of (academic research).

This is the process to access the files in HDFView:

  1. Download the application: https://www.hdfgroup.org/downloads/hdfview/?1656346198
  • Download the .zip file: 'HDFView-3.1.4-win10_64-vs16.zip'
  • Extract the .zip file with something like 7-zip
  1. Download the .h5 file: http://deq1.bse.vt.edu:81/files/cbp/OR1_7700_7980.h5
  • Do this by right-clicking on the link, and then choosing: 'Save link as...'
  • If download discards, click up arrow and hit keep
  1. Click on the downloaded .h5 file to open it (this will open it automatically in the HDFViewer)
  2. Now you are able to see all the groups and different "layers" in our .h5 file
  • Ultimately, you're able to click on 'Show Data with Options', which will provide another window with a table, and you can extract the table as a text file (shown below)

image

This Viewer provides more understanding on the contents of a hdf5 file, and it can hopefully help understand how we can extract the timestamp using R. Maybe we can utilize the Viewer's function to extract .txt files?

Update: Can save table as a .txt file to computer. Working on putting into R.

Attempting to View Output .h5 Files in the HDFView Program:

  • After running the land test case (https://github.com/HARPgroup/HARParchive/issues/211), I thought it would be easier to explore and compare the river and land model outputs if we were to open them in the viewer.
  • It would allow us to view the groups, subgroups, and data tables by clicking on them as folders rather than repeatedly running commands.
  • However, to open them in the viewer they need to be downloaded to your local computer. Since we do not have a link on Github as we did for the first .h5 file, this became an issue.
  • *Command to copy files from server to local computer (found online): scp user@server:/path/to/remotefile.zip /Local/Target/Destination
  • *For multiple files at once: scp user@host:/remote/path/\{file1.zip,file2.zip\} /Local/Path/
- With this, I struggled with how to reference my local computer because calling the local disk "C:" means nothing to the server
- These are the main things I tried:
-  `scp megpritch@deq2:~/OR1_7700_7980.h5 ~/Desktop/`
    - recieved:    _/home/megpritch/Desktop/: Is a directory_
    - Not sure what to do with this information because it doesn't seem to be an error
- ` scp megpritch@deq2:~/\{OR1_7700_7980.h5,forA51800.h5\} ip_address/Desktop/folder_name`
    - recieved:   _No such file or directory_
- `/home/megpritch/OR1_7700_7980.h5 ~\folder_name\OR1_7700_7980.h5`
    - This said that it downloaded successfully but then I couldn't find it on my computer. Turns out it made a copy of the file inside my deq account's home directory but renamed it "folder_nameOR1_7700_7980.h5"

What Now?

rburghol commented 2 years ago

Still be useful since if we simply want to use it to help us understand structure better, we only really need to have one H5 file for Rivers, and another H5 file for land as a template. That is because each land H5 file will share an identical structures to each other land H5 file. Similarly the same will apply for river h5s. I like that you tried scp, and seems like you got close. We can find a more efficient solution tomorrow.