RMI-PACTA / workflow.factset

Other
0 stars 0 forks source link

filter rows in financial data where `issue_type` is `NA` #76

Open cjyetman opened 1 month ago

cjyetman commented 1 month ago

When the "raw" FactSet data is "prepared" in pacta.data.preparation::prepare_financial_data(), the first filter removes any rows that have NA for the issue_type.

https://github.com/RMI-PACTA/pacta.data.preparation/blob/fa6e801ef85263a3a612461a791a6d5d9bd2af89/R/prepare_financial_data.R#L18

This is always done because it's necessary to know the issue type in order to use it properly in the PACTA process. Removing the rows where issue_type is NA typically reduces the in size memory of the financial_data object from multiple GBs to 100's of MBs, dropping 10's of millions of rows. It would be advantageous to preemptively remove those rows from the "raw" FactSet data before it gets to data.prep.

ping @jdhoffa @AlexAxthelm

jdhoffa commented 1 month ago

Makes sense to me! Unless you two can think of a good reason (any reason at all) that we would want to keep those rows for any other purpose in the analysis.

cjyetman commented 1 month ago

Also maybe of interest to @Antoine-Lalechere / maybe @Antoine-Lalechere has an idea if we would ever want financial data that is not assigned an issue type?

Antoine-Lalechere commented 1 month ago

Interesting! I can't think of any use case we may need it - can we have an ISIN filtered out easily so that I check what is kicked out?