RMI-PACTA / pacta.data.preparation

The goal of {pacta.data.preparation} is to prepare and format all input datasets required to run the PACTA for investors tools.
https://rmi-pacta.github.io/pacta.data.preparation/
Other
1 stars 0 forks source link

bug: filter financial data unusable in `ownership_weight` methodology #33

Closed jdhoffa closed 1 month ago

jdhoffa commented 1 month ago

current_shares_outstanding_all_classes–a value that is synonymous with the shares_all_classes value– is used downstream to calculate ownership_weight. This value is unusable by the ownership_weight calculation if it is 0, as it will cause divide by 0 problems and yield Inf results: (https://github.com/RMI-PACTA/pacta.portfolio.allocate/blob/4c96adb9856788d97293e41c0cd68b8951c3c56b/R/calculate_ownership_weight.R#L5).

There is already a filter in prepare_financial_data to remove this case, however it filters a wrong (but similar) column.

An edge case was found by @Antoine-Lalechere in which ADR holdings with no similar EQ holdings pass through and yield a 0 value for shares_all_classes, which identified this bug.

This PR adjusts the filter to ensure that the correct value is filtered and usable in the downstream analysis.

See the reprex below for more information:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(pacta.data.preparation)

# consider the case there an entity has an equity holding and an ADR holding
# consider further, that the equity holding that reflects the ADR holding has no price or share information
# note: this was a real case in the data
data <- tibble::tribble(
  ~fsym_id,          ~isin, ~factset_entity_id, ~adj_price, ~adj_shares_outstanding, ~issue_type, ~one_adr_eq,
  "ABCDE-F", "US123456789",         "12ABCD-E",      5.5,               12345678,        "AD",          NA,
  "ABCDE-G", "JP123456789",         "12ABCD-E",         NA,                      NA,        "EQ",          NA
)

issue_code_bridge <- tibble::tribble(
  ~issue_type_code, ~asset_type,
  "AD",    "Equity",
  "EQ",    "Equity"
) |> 
  pacta.data.preparation::standardize_asset_type_names() 

# our current methodology does NOT consider ADR holdings in the calculation of ownership_weight denominator
# what this means is that, in the case above, it would be impossible to calculate an appropriate ownership_weight
# since the denominator would be 0

# however, as can be seen below, the ADR holding does appear in the output, with 
# a value of 0 for current_shares_outstanding_all_classes. This is unusable downstream by the ownership_weight calculation
# 
# See: https://rmi-pacta.github.io/pacta.data.preparation/articles/share_ownership_methodology.html for more information
out <- pacta.data.preparation::prepare_financial_data(data, issue_code_bridge)

print("US123456789" %in% out$isin)
#> [1] TRUE
print(filter(out, isin == "US123456789")["current_shares_outstanding_all_classes"])
#> # A tibble: 1 × 1
#>   current_shares_outstanding_all_classes
#>                                    <dbl>
#> 1                                      0

Created on 2024-07-17 with reprex v2.1.1

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 18.82%. Comparing base (0b3d197) to head (0540029).

Files Patch % Lines
R/prepare_financial_data.R 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #33 +/- ## ======================================= Coverage 18.82% 18.82% ======================================= Files 35 35 Lines 1126 1126 ======================================= Hits 212 212 Misses 914 914 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.