Closed cjyetman closed 1 month ago
Attention: Patch coverage is 0%
with 5 lines
in your changes missing coverage. Please review.
Project coverage is 18.82%. Comparing base (
fa6e801
) to head (d3d185a
).
Files | Patch % | Lines |
---|---|---|
R/prepare_financial_data.R | 0.00% | 5 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@cjyetman thanks as always for the careful investigation here, and the comprehensive explanation in this PR. Really solid example of what a good PR should look like, I appreciate it.
Reviewing now.
The problem was that unwanted Equity rows were getting through the filters of the financial data. Equity rows that have
NA
or0
forunit_share_price
orcurrent_shares_outstanding
end up with aNA
or0
value for market capitalization (unit_share_price * current_shares_outstanding
), which leads to anNA
orInf
value for "ownership weight" when "value invested" is divided by "market_cap". Since "market cap" is necessary information for the ownership weight calculation, we should filter out rows where market capitalization cannot be properly calculated.These filters were originally introduced in https://github.com/RMI-PACTA/archive.pacta.data.preparation/pull/226 (sorry @jdhoffa, not trying to point fingers, just want to be super clear about the provenance). It looks like the filters added there did not have any additional intent, they simply did not work completely as expected.
I'm propsing to modify the filter with a
dplyr::case_when()
which is easier to interpret and maintain. To understand the logic, see the below code. The goals are:NA
for theadj_price
NA
, and > 0) for bothadj_price
andadj_shares_outstanding
The rows of Equity in the filtered result should only have one row where both
adj_price
andadj_shares_outstanding
have a positive value.Note that
.data$adj_price > 0 & .data$adj_shares_outstanding > 0
can returnNA
s, but dplyr::filter()drops rows where the result is
NA(versus
TRUEor
FALSE`).Modifying @Antoine-Lalechere's example code to use the new filter (and adjusting the column names appropriately), should yield no rows...