Open jordansread opened 4 years ago
I don't see Lake of the Woods on Tallgrass or in my current list of 638 modelable lakes. Do you have an NHD-HR ID for that one to help me double-check? Do I need to update my source data again?
I do see Cass Lake on Tallgrass from the April 15 runs:
(base) [aappling@tg-login2 lake-temperature-neural-networks] ls -Rl 2_model/out/nhdhr_166868528
2_model/out/nhdhr_166868528:
total 8
drwxr-sr-x 2 aappling cida 4096 Apr 15 17:33 finetune_predict
drwxr-sr-x 2 aappling cida 4096 Apr 13 17:56 pretrain_predict
2_model/out/nhdhr_166868528/finetune_predict:
total 21380
-rw-r--r-- 1 aappling cida 67 Apr 15 17:33 checkpoint
-rw-r--r-- 1 aappling cida 759 Apr 15 17:19 model_config.tsv
-rw-r--r-- 1 aappling cida 29060 Apr 15 17:33 model.data-00000-of-00001
-rw-r--r-- 1 aappling cida 505 Apr 15 17:33 model.index
-rw-r--r-- 1 aappling cida 2941882 Apr 15 17:33 model.meta
-rw-r--r-- 1 aappling cida 9817 Apr 15 17:33 params.npz
-rw-r--r-- 1 aappling cida 14596135 Apr 15 17:33 preds.npz
-rw-r--r-- 1 aappling cida 21925 Apr 15 17:33 stats.npz
-rw-r--r-- 1 aappling cida 4266540 Apr 15 17:20 varied_inputs.npz
2_model/out/nhdhr_166868528/pretrain_predict:
total 37252
-rw-r--r-- 1 aappling cida 67 Apr 13 17:56 checkpoint
-rw-r--r-- 1 aappling cida 715 Apr 13 17:28 model_config.tsv
-rw-r--r-- 1 aappling cida 29060 Apr 13 17:56 model.data-00000-of-00001
-rw-r--r-- 1 aappling cida 505 Apr 13 17:56 model.index
-rw-r--r-- 1 aappling cida 2941882 Apr 13 17:56 model.meta
-rw-r--r-- 1 aappling cida 9822 Apr 13 17:56 params.npz
-rw-r--r-- 1 aappling cida 14701592 Apr 13 17:56 preds.npz
-rw-r--r-- 1 aappling cida 42383 Apr 13 17:56 stats.npz
-rw-r--r-- 1 aappling cida 20391792 Apr 13 17:29 varied_inputs.npz
so the next question for that one is where it's failing to get to ScienceBase.
Lake of the Woods is nhdhr_123319728
I can dig in on my end too and check the data release
I do see Cass Lake in my prep data.frames in mntoha-data-release...
> pgdl_predictions_df <- remake::fetch('pgdl_predictions_df')
> pgdl_predictions_df %>% slice(grep('166868528', site_id)) %>% glimpse
Rows: 1
Columns: 4
$ site_id <chr> "nhdhr_166868528"
$ source_filepath <chr> "../lake-temperature-neural-networks/3_assess/out/nhd…
$ source_hash <chr> "ffb2261c0f7d5a3f38e10bd8e9577e65"
$ out_file <chr> "pgdl_nhdhr_166868528_temperatures.csv"
Should be in Group 2:
> pgdl_site_ids_grouped <- remake::fetch('pgdl_site_ids_grouped')
> pgdl_site_ids_grouped %>% slice(grep('nhdhr_166868528', site_id))
# A tibble: 1 x 2
site_id group_id
<chr> <chr>
1 nhdhr_166868528 02_N47.00-48.00_W94.00-97.25
And actually, I see Lake of the Woods in there too, searching by nhdhr:
> pgdl_site_ids_grouped %>% slice(grep('nhdhr_123319728', site_id))
# A tibble: 1 x 2
site_id group_id
<chr> <chr>
1 nhdhr_123319728 01_N48.00-49.50_W89.50-97.25
I think I see Lake of the Woods in the Group 1 zipfile (on my Tallgrass mntoha-data-release repo):
> group1 <- unzip('tmp/pgdl_predictions_01_N48.00-49.50_W89.50-97.25.zip', list=TRUE)
> group1 %>% slice(grep('nhdhr_123319728', Name))
Name Length Date
1 pgdl_nhdhr_123319728_temperatures.csv 36091975 2020-04-23 15:17:00
and here's Cass:
> group2 <- unzip('tmp/pgdl_predictions_02_N47.00-48.00_W94.00-97.25.zip', list=TRUE)
> group2 %>% slice(grep('nhdhr_166868528', Name))
Name Length Date
1 pgdl_nhdhr_166868528_temperatures.csv 19412665 2020-04-23 15:16:00
I downloaded those two zip files from ScienceBase and confirmed that Cass and LotW are indeed in those files:
> unzip('~/Downloads/pgdl_predictions_01_N48.00-49.50_W89.50-97.25.zip', list=TRUE) %>% slice(grep('nhdhr_123319728', Name))
Name Length Date
1 pgdl_nhdhr_123319728_temperatures.csv 36091975 2020-04-23 15:17:00
> unzip('~/Downloads/pgdl_predictions_02_N47.00-48.00_W94.00-97.25.zip', list=TRUE) %>% slice(grep('nhdhr_166868528', Name))
Name Length Date
1 pgdl_nhdhr_166868528_temperatures.csv 19412665 2020-04-23 15:16:00