DOI-USGS / lake-temperature-process-models-old

Pipeline #2
Other
0 stars 6 forks source link

Check on some lakes in the MNTOHA modeling set #26

Open jordansread opened 4 years ago

jordansread commented 4 years ago

I am missing Lake of the Woods (DOW 39000200 or 39000201, from either PGDL or GLM3 TOHA release) and I am missing Cass Lake (04003000) from PGDL which doesn't make sense to me.

aappling-usgs commented 4 years ago

I don't see Lake of the Woods on Tallgrass or in my current list of 638 modelable lakes. Do you have an NHD-HR ID for that one to help me double-check? Do I need to update my source data again?

I do see Cass Lake on Tallgrass from the April 15 runs:

(base) [aappling@tg-login2 lake-temperature-neural-networks] ls -Rl 2_model/out/nhdhr_166868528
2_model/out/nhdhr_166868528:
total 8
drwxr-sr-x 2 aappling cida 4096 Apr 15 17:33 finetune_predict
drwxr-sr-x 2 aappling cida 4096 Apr 13 17:56 pretrain_predict

2_model/out/nhdhr_166868528/finetune_predict:
total 21380
-rw-r--r-- 1 aappling cida       67 Apr 15 17:33 checkpoint
-rw-r--r-- 1 aappling cida      759 Apr 15 17:19 model_config.tsv
-rw-r--r-- 1 aappling cida    29060 Apr 15 17:33 model.data-00000-of-00001
-rw-r--r-- 1 aappling cida      505 Apr 15 17:33 model.index
-rw-r--r-- 1 aappling cida  2941882 Apr 15 17:33 model.meta
-rw-r--r-- 1 aappling cida     9817 Apr 15 17:33 params.npz
-rw-r--r-- 1 aappling cida 14596135 Apr 15 17:33 preds.npz
-rw-r--r-- 1 aappling cida    21925 Apr 15 17:33 stats.npz
-rw-r--r-- 1 aappling cida  4266540 Apr 15 17:20 varied_inputs.npz

2_model/out/nhdhr_166868528/pretrain_predict:
total 37252
-rw-r--r-- 1 aappling cida       67 Apr 13 17:56 checkpoint
-rw-r--r-- 1 aappling cida      715 Apr 13 17:28 model_config.tsv
-rw-r--r-- 1 aappling cida    29060 Apr 13 17:56 model.data-00000-of-00001
-rw-r--r-- 1 aappling cida      505 Apr 13 17:56 model.index
-rw-r--r-- 1 aappling cida  2941882 Apr 13 17:56 model.meta
-rw-r--r-- 1 aappling cida     9822 Apr 13 17:56 params.npz
-rw-r--r-- 1 aappling cida 14701592 Apr 13 17:56 preds.npz
-rw-r--r-- 1 aappling cida    42383 Apr 13 17:56 stats.npz
-rw-r--r-- 1 aappling cida 20391792 Apr 13 17:29 varied_inputs.npz

so the next question for that one is where it's failing to get to ScienceBase.

jordansread commented 4 years ago

Lake of the Woods is nhdhr_123319728

I can dig in on my end too and check the data release

aappling-usgs commented 4 years ago

I do see Cass Lake in my prep data.frames in mntoha-data-release...

> pgdl_predictions_df <- remake::fetch('pgdl_predictions_df')
> pgdl_predictions_df %>% slice(grep('166868528', site_id)) %>% glimpse
Rows: 1
Columns: 4
$ site_id         <chr> "nhdhr_166868528"
$ source_filepath <chr> "../lake-temperature-neural-networks/3_assess/out/nhd…
$ source_hash     <chr> "ffb2261c0f7d5a3f38e10bd8e9577e65"
$ out_file        <chr> "pgdl_nhdhr_166868528_temperatures.csv"

Should be in Group 2:

> pgdl_site_ids_grouped <- remake::fetch('pgdl_site_ids_grouped')
>  pgdl_site_ids_grouped %>% slice(grep('nhdhr_166868528', site_id))
# A tibble: 1 x 2
  site_id         group_id
  <chr>           <chr>
1 nhdhr_166868528 02_N47.00-48.00_W94.00-97.25

And actually, I see Lake of the Woods in there too, searching by nhdhr:

>  pgdl_site_ids_grouped %>% slice(grep('nhdhr_123319728', site_id))
# A tibble: 1 x 2
  site_id         group_id
  <chr>           <chr>
1 nhdhr_123319728 01_N48.00-49.50_W89.50-97.25
aappling-usgs commented 4 years ago

I think I see Lake of the Woods in the Group 1 zipfile (on my Tallgrass mntoha-data-release repo):

> group1 <- unzip('tmp/pgdl_predictions_01_N48.00-49.50_W89.50-97.25.zip', list=TRUE)
> group1 %>% slice(grep('nhdhr_123319728', Name))
                                   Name   Length                Date
1 pgdl_nhdhr_123319728_temperatures.csv 36091975 2020-04-23 15:17:00

and here's Cass:

> group2 <- unzip('tmp/pgdl_predictions_02_N47.00-48.00_W94.00-97.25.zip', list=TRUE)
> group2 %>% slice(grep('nhdhr_166868528', Name))
                                   Name   Length                Date
1 pgdl_nhdhr_166868528_temperatures.csv 19412665 2020-04-23 15:16:00
aappling-usgs commented 4 years ago

I downloaded those two zip files from ScienceBase and confirmed that Cass and LotW are indeed in those files:

> unzip('~/Downloads/pgdl_predictions_01_N48.00-49.50_W89.50-97.25.zip', list=TRUE) %>% slice(grep('nhdhr_123319728', Name))
                                   Name   Length                Date
1 pgdl_nhdhr_123319728_temperatures.csv 36091975 2020-04-23 15:17:00

> unzip('~/Downloads/pgdl_predictions_02_N47.00-48.00_W94.00-97.25.zip', list=TRUE) %>% slice(grep('nhdhr_166868528', Name))
                                   Name   Length                Date
1 pgdl_nhdhr_166868528_temperatures.csv 19412665 2020-04-23 15:16:00