Closed mle2718 closed 11 months ago
The problem seems to be that the methods used by date_cols()
to determine whether a column is a date variable don't anticipate whatever is going on in the data. In the FishSET version of the scallop data it leaves NAME
alone (lease_FS
doesn't exist). It's hard to know exactly what's going on unless I run your data.
One thing that might help me understand what's going on without sending me the data is pasting the unique values from NAME
and lease_FS
into the comments. The problem may be that one particular name is throwing date_cols()
off.
This is NAME, but lease_FS is the same:
OCS-A 0482 - GSOE I LLC
18
OCS-A 0483 - Virginia Electric and Power Company
1
OCS-A 0486 - Revolution Wind, LLC
358
OCS-A 0487 - Sunrise Wind LLC
735
OCS-A 0490 - US Wind Inc.
19
OCS-A 0498 - Ocean Wind LLC
20
OCS-A 0499 - Atlantic Shores Offshore Wind Projects 1 & 2, LLC's 63 OCS-A 0500 - Bay State Wind LLC 258 OCS-A 0501 - Vineyard Wind LLC 22 OCS-A 0508 - Avangrid Renewables LLC 1 OCS-A 0512 - Empire Offshore Wind, LLC 1409 OCS-A 0517 - South Fork Wind, LLC 37 OCS-A 0519 - Skipjack Offshore Energy LLC 16 OCS-A 0520 - Beacon Wind LLC 8 OCS-A 0521 - Mayflower Wind Energy LLC 11 OCS-A 0522 - Vineyard Northeast LLC 8 OCS-A 0532 - Orsted North America Inc. 21 OCS-A 0534 - Park City Wind LLC 34 OCS-A 0537 - OW Ocean Winds East, LLC 683 OCS-A 0538 - Attentive Energy LLC 1150 OCS-A 0539 - Community Offshore Wind, LLC 1058 OCS-A 0541 - Atlantic Shores Offshore Wind Bight, LLC 1709 OCS-A 0542 - Invenergy Wind Offshore LLC 882 OCS-A 0549 - Atlantic Shores Offshore Wind, LLC 71 Provisional - OCS-A 0544 - Mid-Atlantic Offshore Wind LLC 594
its definitely this variable, I tried this:
dataset<-final_product_lease
d_cols <- date_cols(dataset)
dataset[d_cols] <- lapply(d_cols, function(d) as.character(dataset[[d]]))
d_cols
test<-c("Time")
#dataset[d_cols] <- lapply(dataset[d_cols], date_parser)
dataset[test] <- lapply(dataset[test], date_parser)
and stepped through the different entries of d_cols. The only things that throw an error are the NAME and lease_fs columns. Time throws a single, very mysterious "142839 failed to parse" error. It's surprising because the value of Tine in that row is "16:30:00" which is a pretty normal looking
I also stepped through a bit of the date_cols function.
dataset<-final_product_lease
names(dataset)
dat<-dataset
# This is taken from date_cols()
date_lgl <- logical(ncol(dat))
names(date_lgl) <- names(dat)
date_funs <- list(lubridate::mdy, lubridate::dmy, lubridate::ymd,
lubridate::ydm, lubridate::dym)
date_helper <- function(dates, fun) {
dates <- trimws(dates)
dates <- gsub("\\s\\d{2}:\\d{2}:\\d{2}$", "", dates)
out <- rlang::expr(!all(is.na(suppressWarnings((!!fun)(!!dates)))))
eval(out)
}
date_apply <- function(dates) {
any(purrr::map_lgl(date_funs, function(fun) date_helper(dates,
fun)))
}
nr <- nrow(dat)
# if (nr > 1000)
dat_slice <- 1000
# else dat_slice <- round(nr * 0.5)
date_cols <- purrr::map_lgl(dat[!numeric_cols(dat, "logical")][seq_len(dat_slice),
], date_apply)
date_cols <- date_cols[date_cols]
output
> date_cols
DATE_TRIP Date Time NAME
TRUE TRUE TRUE TRUE
lease_FS scallop_fishing_yearD
TRUE TRUE
I'm not sure what is going on inside those functions though.
The problem is that date_cols()
tries to detect date columns by passing them to 5 lubridate conversion functions, two of which (ymd()
and ydm()
) incorrectly identify "OCS-A 0499 - Atlantic Shores Offshore Wind Projects 1 & 2, LLC's"
as a date. I'm not sure why, but it converts it to "0499-02-01"
.
The good news is that this is a relatively easy fix. Bad news is that it will require a new install once the changes are made. The only work around I can think of is to change the value of that name so that it doesn't trigger the conversion.
The reason Time
is raising a warning is that date_parser()
doesn't work on time-only columns (i.e. no calendar date, just time). This just means that additional checks need to be added to load_maindata()
.
The problem is that
date_cols()
tries to detect date columns by passing them to 5 lubridate conversion functions, two of which (ymd()
andydm()
) incorrectly identify"OCS-A 0499 - Atlantic Shores Offshore Wind Projects 1 & 2, LLC's"
as a date. I'm not sure why, but it converts it to"0499-02-01"
.The good news is that this is a relatively easy fix. Bad news is that it will require a new install once the changes are made. The only work around I can think of is to change the value of that name so that it doesn't trigger the conversion.
Very odd. Looks like lubridate is very aggressive about finding dates and times.
I did this:
final_product_lease <- final_product_lease %>%
mutate(KILOGRAMS = POUNDS/pounds_to_kg,
LANDED_KG=LANDED/pounds_to_kg) %>%
mutate(NAME= stringr::str_replace(NAME,"OCS-A 0499", "OCS-A0499"),
lease_FS=stringr::str_replace(lease_FS,"OCS-A 0499", "OCS-A0499") )
as a workaround.
@mchaji -- I've made this change in main here: 4c62d8fa905bda4b68d7d83ed8aa54253493b856 I cherry-picked it over to the scallop_tiny_report branch here: 48ebc54e5c52ce6559ec1a381429c49f4ecfcfe8.
As long as you re-pull, you should pick up this change.
@BryceMcManus-NOAA,
@mchaji and I are both getting this error message when we try to load data into fishset. We're on fresh installs from gitlabs
https://github.com/NEFSC/READ-SSB-CHAJI-Effort-Displacement---Scallop/blob/f9a75bca6da88b630a0f9af6f6b579fbc9023870/analysis_code/scallop_analysis_0322.Rmd#L155-L170
I tried a bit of debugging by doing this:
I was able to run the scallop analysis code on our old server, using an older FishSET install, so I suspect that this is an issue with some development of fishset. In particular, it's a little odd that NAME and lease_FS are being picked up as date fields. These are the name of the Wind areas, it looks something like OCS-A 0538 - Attentive Energy LLC | OCS-A 0538 - Attentive Energy LLC. They both have lots of NAs.
Any idea what's going on?