Closed AlexAxthelm closed 7 months ago
Docker image from this PR (849d0096faeeb2ae3e53da0747e08f775e95395e) created
docker pull ghcr.io/rmi-pacta/workflow.factset:pr3
Not sure to 100% get it here.
If in the previous case, we also get an adj_price
for the Jan 10, 2023 , which one would we have at the end of the process, the newest one closer to data_timestamp
or still the Dec 31 2023, because there is another filter somewhere else?
@AlexAxthelm
If in the previous case, we also get an
adj_price
for the Jan 10, 2023 , which one would we have at the end of the process, the newest one closer todata_timestamp
or still the Dec 31 2023, because there is another filter somewhere else?
In the current process, if we specify data_timestamp to be Jan 10, then we should end with no share prices or fund holdings, since in the actual data, none of those have end-of-reporting-period dates on Jan 10.
Theoretically, if we got a share price reported on Dec 31, and Jan 10, and we specified Jan 23 as our date of interest, we would get the more recent one (Jan 10)
If you specify a date, the data should be from that date... otherwise we have a lot of explaining to do.
Closing, since there's been no pick up on this thread.
@AlexAxthelm so what happened with this ? does it mean it was implemented or not? and what are the consequences? Is a related ADO ticket for these type of decisions?
Not implemented, so no changes and no consequences, since there wasn't any further discussion around it. If we want to pick up this thread again we can at any time, but someone other than me should probably sketch out what it would look like.
Reviving this thread, since it looks like for 2023Q4 data, we will need to include data timestamps for 2023-12-31
and 2023-12-29
(last Friday of the year).
Open questions:
Do we want to have a date range, or a discrete list of dates? (2023-12-24
through 2023-12-31
vs c(2023-12-29, 2023-12-31)
)
I would lean towards the latter for clarity in what's included, but the former would reduce the burden for the future, since we could consistently use "X days lookback" as part of our methodology.
What should be the lookback period? (only relevant if we choose the "range" option above). 1 week back, or 1 month back seem like reasonable options.
~Should we use this lookback filter anywhere we use date_timestamp
? See @cjyetman's comment above: https://github.com/RMI-PACTA/workflow.factset/pull/3#issuecomment-1862642755.
I believe that for consistency, whatever filter we're using should be applied to all our data. This would also make messaging easier, since we don't have to worry about communicating nuances like "fund holdings as of these dates, but share price and volume from these")~ This is not a major issue with actual data, so non-fund data sets were removed in f511fe7
~Where in the processing pipeline should we use this filter? Currently, these are applied early on in the process. One possible concern here is that the report_date
from different tables may be different for a given fund. (report data for holdings may not sync with report date for filing history). FactSet does not explicitly identify a connection between these two field across these tables.~ Question not relevant in real data. See https://github.com/RMI-PACTA/workflow.factset/pull/3#issuecomment-1970923192
@cjyetman @Antoine-Lalechere @NickPACTA @Nicky-Halterman @NAYRA-HERRERA cc @hodie @jdhoffa
Since different metrics/values have different levels of volatility, significantly so in some cases, I don't think there's a justifiable way to apply the same date range to all data.
Where in the processing pipeline should we use this filter? Currently, these are applied early on in the process. One possible concern here is that the
report_date
from different tables may be different for a given fund. (report data for holdings may not sync with report date for filing history). FactSet does not explicitly identify a connection between these two field across these tables.
Reviewing actual data shows this is incredibly uncommon (1 entry of 4.4M). Will resolve with join condition that holding report date must be equal to filing report date. Simplifies filtering options significantly.
I've reached out to FactSet support for confirmation on methodolgy for pulling fund composition as-of a given date.
We've heard back from FactSet Support:
Regarding the query 'for determining the current composition, should we extract the most recent reporting date for each fund - our Ownership team confirmed ,YES!
So we're on the right track, in that getting a funds (known) composition involves getting the data from its most recent report_date
⚠️ Potentially Breaking: Changes logic for pulling FactSet data.
Previous behavior: For certain fields in factset data, filter to exact date of
data_timestamp
(as-of date for pulling FactSet Data). Mostly applied to end-of month or end-of-period reporting values (ex. share prices)New behavior: Pull most recent value, if most recent value is after
data_timestamp_lookback
(which defaults to 1 month beforedata_timestamp
for share price & fund holdings, 1 year before for financing data.For example, if we enter
data_timestamp <- 20230123
(Jan 23, 2023), then filtering on exact date yield no rows foradj_price
(Adjusted share price) when pulling financial data.However, with the lookback of 1 month, we get all the rows for securities which posted a share price as of December 31 2022 (since that is after Dec 23, 2022).
Key decision points: