RMI-PACTA / workflow.factset

Other
0 stars 0 forks source link

Feat/45 industry map bridge #46

Closed AlexAxthelm closed 4 months ago

AlexAxthelm commented 4 months ago

Adds get_industry_map_bridge() to replace pacta.data.preparation::factset_industry_map_bridge and the manual steps that were involved in creating that.

Similar to #39, implements current state of that file, without examining the accuracy of those mappings (a problem for a later date)

Closes #45

github-actions[bot] commented 4 months ago

Docker image from this PR (d99a9765a20e04962610ab9e1484c4c6b466d68e) created

docker pull ghcr.io/rmi-pacta/workflow.factset:pr46
AlexAxthelm commented 4 months ago

@cjyetman @jdhoffa I've linked to a file generated against the latest FS database on Teams.

this is the output I'm getting from waldo::compare

pr46 <- readRDS("timestamp-20230123T000000Z_pulled-20000101T000001_factset_industry_map_bridge.rds")
waldo::compare(pr46, pacta.data.preparation::factset_industry_map_bridge)
#> `class(old)`: "tbl_df"      "tbl"    "data.frame"             
#> `class(new)`: "spec_tbl_df" "tbl_df" "tbl"        "data.frame"
#> 
#> `attr(old, 'problems')` is absent
#> `attr(new, 'problems')` is a pointer
#> 
#> `attr(old, 'spec')` is absent
#> `attr(new, 'spec')` is an S3 object of class <col_spec>, a list

Created on 2024-02-14 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.2 #> system aarch64, darwin23.0.0 #> ui unknown #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Belgrade #> date 2024-02-14 #> pandoc 3.1.7 @ /opt/homebrew/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.2) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.2) #> blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> countrycode 1.5.0 2023-05-30 [1] CRAN (R 4.3.1) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) #> data.table 1.14.10 2023-12-08 [1] CRAN (R 4.3.2) #> DBI 1.2.1 2024-01-12 [1] CRAN (R 4.3.2) #> dbplyr 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) #> diffobj 0.3.5 2021-10-05 [1] CRAN (R 4.3.2) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.2) #> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.2) #> dtplyr 1.3.1 2023-03-22 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> pacta.data.preparation 0.1.0.9000 2024-01-18 [1] Github (RMI-PACTA/pacta.data.preparation@9a091f5) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1) #> rematch2 2.1.2 2020-05-01 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> RPostgres 1.4.6 2023-10-22 [1] CRAN (R 4.3.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.1) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.2) #> tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.2) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1) #> timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.2) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> waldo 0.5.2 2023-11-02 [1] CRAN (R 4.3.2) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> #> [1] /opt/homebrew/lib/R/4.3/site-library #> [2] /opt/homebrew/Cellar/r/4.3.2/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

It looks like there's some differences in the class and attributes that I'm not sure if they are important or not, but as far as the data contents go, we're on track.

cjyetman commented 4 months ago

@cjyetman @jdhoffa I've linked to a file generated against the latest FS database on Teams.

this is the output I'm getting from waldo::compare

pr46 <- readRDS("timestamp-20230123T000000Z_pulled-20000101T000001_factset_industry_map_bridge.rds")
waldo::compare(pr46, pacta.data.preparation::factset_industry_map_bridge)
#> `class(old)`: "tbl_df"      "tbl"    "data.frame"             
#> `class(new)`: "spec_tbl_df" "tbl_df" "tbl"        "data.frame"
#> 
#> `attr(old, 'problems')` is absent
#> `attr(new, 'problems')` is a pointer
#> 
#> `attr(old, 'spec')` is absent
#> `attr(new, 'spec')` is an S3 object of class <col_spec>, a list

Created on 2024-02-14 with reprex v2.0.2

Session info It looks like there's some differences in the class and attributes that I'm not sure if they are important or not, but as far as the data contents go, we're on track.

I'm near certain that the difference here is that the new object in this comparison was imported using readr::read_csv(), which adds the problems and spec attributes to review import issues and the spec_tbl_df class to the data.frames/tibbles that it creates.

i.e. not a concern (from my side)