Open cjyetman opened 7 months ago
given the first 2 lines of prepare_financial_data() https://github.com/RMI-PACTA/pacta.data.preparation/blob/530af7a154224b6303dcf87d869f550562ac553f/R/prepare_financial_data.R#L17-L18 maybe the best optimization would be to not export from FactSet any rows that have issue_type == NA? factset_financial_data_path <- "~/Desktop/dataprep_docker/inputs/timestamp-20221231T000000Z_pulled-20240207T161053Z_factset_financial_data.rds" financial_data <- readRDS(factset_financial_data_path) nrow(financial_data) #> [1] 29852045 format(object.size(financial_data), units = "auto", standard = "SI") #> [1] "5.5 GB" financial_data_no_na <- dplyr::filter(financial_data, !is.na(issue_type)) nrow(financial_data_no_na) #> [1] 1582972 format(object.size(financial_data_no_na), units = "auto", standard = "SI") #> [1] "305.3 MB"
given the first 2 lines of prepare_financial_data() https://github.com/RMI-PACTA/pacta.data.preparation/blob/530af7a154224b6303dcf87d869f550562ac553f/R/prepare_financial_data.R#L17-L18
prepare_financial_data()
maybe the best optimization would be to not export from FactSet any rows that have issue_type == NA?
issue_type
NA
factset_financial_data_path <- "~/Desktop/dataprep_docker/inputs/timestamp-20221231T000000Z_pulled-20240207T161053Z_factset_financial_data.rds" financial_data <- readRDS(factset_financial_data_path) nrow(financial_data) #> [1] 29852045 format(object.size(financial_data), units = "auto", standard = "SI") #> [1] "5.5 GB" financial_data_no_na <- dplyr::filter(financial_data, !is.na(issue_type)) nrow(financial_data_no_na) #> [1] 1582972 format(object.size(financial_data_no_na), units = "auto", standard = "SI") #> [1] "305.3 MB"
Originally posted by @cjyetman in https://github.com/RMI-PACTA/pacta.data.preparation/issues/334#issuecomment-1942059663
Yeah. That makes sense to me.
Originally posted by @cjyetman in https://github.com/RMI-PACTA/pacta.data.preparation/issues/334#issuecomment-1942059663