RMI-PACTA / workflow.factset

Other
0 stars 0 forks source link

workflow.factset

Lifecycle: stable

This repo contains the workflow.factset R package, a Dockerfile to build an image containing that package and its dependencies, and an Azure ARM template to deploy that image, along with factset_data_loader and a PostgreSQL database.

QUICKSTART: See "Deploy", below.

To deploy main branch with a databse and running FactSet Data Loader:

az deployment group create \
  --resource-group "RMI-SP-PACTA-DEV" \
  --template-file azure-deploy.with-db.json \
  --parameters azure-deploy.with-db.rmi-pacta.parameters.json

To deploy main branch Extracting PACTA files from an existing FDS database:

az deployment group create \
  --resource-group "RMI-SP-PACTA-DEV" \
  --template-file azure-deploy.extract-only.json \
  --parameters azure-deploy.extract-only.rmi-pacta.parameters.json

workflow.factset R package

The workflow.factset package's purpose is to extract data from a database prepared by the FactSet DataFeed Loader application. For more information on that application, and a docker container to use is efficiently, please see RMI-PACTA/factset_data_loader.

The primary callable function in the package is export_pacta_files(), which serves as a wrapper around more-targeted functions which download data from the database in a format expected by pacta.data.preparation. export_pacta_files() then writes those in-memory tables to files. Databases other than PostgreSQL may work, but are not tested.

The default values to control behavior of export_pacta_files() (and the related connect_factset_db()) are controlled by OS Environment variables, but use the standard R argument system if you are including these functions in a flow other than the one implemented in the Docker image in this repo.

Environment variables to control default behavior

Docker image

The Docker image defined in Dockerfile contains all required dependencies for interacting with PostgreSQL databases, and have been tested against Azure Flexible Server for PostgreSQL (See ARM Template, below).

Note: Setting the LOG_LEVEL Environment Variable for the docker container is useful for controlling the verbosity of the logs.

The image builds automatically by GitHub actions, and hosted publicly on ghcr.io. See "Packages", to the right. The main tag should be used, rather than latest.

Azure ARM template

azure-deploy.json is an Azure ARM Template which by default deploys:

Alternately, the deploy template will only deploy the docker container from this repo's image if the updateDB parameter is set to false (see "Parameters and Variables", below).

Prerequisites

Parameters and Variables

ARM Template Parameters can be set at deploy-time to control the properties of the deployed resources. The azure-deploy.json template in this repo makes use of them to pass information to the containers and database. All parameters must have values, but most have sensible defaults already defined in the template, and the rest have example values defined in azure-deploy.example.parameters.json.

Parameters

Variables

Variables in ARM templates are populated at deploy-time (can be influenced by parameters), but can only be edited in the ARM template itself. Many of the variables used in this template are detailed in the factset_data_loader repo.

Key variables to be aware of:

Deploy

Optional: Create a parameters file (azure-deploy.example.parameters.json serves as a template) for parameters that do not have a default. If you do not create this file, then the deploy process will prompt for values.

A parameter file with the values that the RMI-PACTA team uses for extracting data is available at azure-deploy.rmi-pacta.parameters.json.

# run from repo root

# change this value as needed.
RESOURCEGROUP="RMI-SP-PACTA-DEV"

# To deploy with DB and FDSLoader:
az deployment group create \
  --resource-group "$RESOURCEGROUP" \
  --template-file azure-deploy.with-db.json \
  --parameters azure-deploy.with-db.rmi-pacta.parameters.json

# To deploy without DB and FDSLoader:
az deployment group create \
  --resource-group "$RESOURCEGROUP" \
  --template-file azure-deploy.extract-only.json \
  --parameters azure-deploy.extract-only.rmi-pacta.parameters.json

For security, the RMI-PACTA parameters file makes heavy use of extracting secrets from an Azure Key vault, but an example file that passes parameters "in the clear" is available as azure-deploy.example.parameters.json

Non RMI-PACTA users can define their own parameters (examples in the azure-deploy.example.parameters.json file) and invoke the appropriate ARM Template

Local Development

Build

Note that the image supports amd64 platforms. If you are running on an arm64 machine (Apple Silicon), you may need to change the preferrred build platform with:

export DOCKER_DEFAULT_PLATFORM=linux/amd64

To build the image, you can use the standard mechanism of docker build .. To build and test in one step, you can alternately use docker-compose up --build

Testing

Partial (manual) local testing is possible via docker-compose. Currently get_issue_code_bridge() is the sole function with necessary testing infrastructure.

Testing Steps:

docker-compose up
# in another terminal:
docker attach workflowfactset-workflow.factset-1 # enters the workflow.factset container

This enters the container, which is running R in an interactive session

#in that container
library(workflow.factset)
conn <- connect_factset_db()
issue_code_bridge <- get_issue_code_bridge(conn = conn)
issue_code_bridge

From here, you can exit R as usual (q()), and then turn off the database container with:

docker-compose down --volumes

Exported Files

The files exported by workflow.factset::export_pacta_files() are:

factset_entity_financing_data.rds

Column Name Column Type Example Content Description
fsym_id chr "XXXXXX-R" FactSet identifier for security
date date 2022-12-31 date of balance sheet data
currency chr "USD" currency for balance sheet data
ff_mkt_val dbl 2000000 Market Value - based on latest closing price and monthly shares
ff_debt dbl 1000000 Total debt
fsym_company_id chr "XXXXXX-S" fsym_id connecting to FactSet Fundamentals dataset
factset_entity_id chr "XXXXXX-E" FactSet identifier for an entity

factset_entity_info.rds

Column Name Column Type Example Content Description
factset_entity_id chr "XXXXXX-E" FactSet identifier for an entity
entity_proper_name chr "FooBar, Inc." Entity common name, normalized and in proper case
iso_country chr "US" 2 letter country code for domicile
sector_code chr "6000" 4 digit code for FactSet sector classification
factset_sector_desc chr "Miscellaneous" FactSet description for sector
industry_code chr "6005" 4 digit code for FactSet industry classification
factset_industry_desc chr "Miscellaneous" FactSet description for industry
credit_parent_id chr "XXXXXX-E" FactSet entity ID for credit parent
ent_entity_affiliates_last_update chr "2023-12-21T22:35:27Z" Timestamp for last update of ent_entity_affiliates table

factset_financial_data.rds

Column Name Column Type Example Content Description
fsym_id chr "XXXXXX-S" FactSet identifier for financial instrument
isin chr "XX0000000001" ISIN for instrument
factset_entity_id chr XXXXXX-E FactSet identifier for an entity
adj_price dbl 100.5 Adjusted Share price
adj_shares_outstanding dbl NA Adjusted number of shares outstanding
issue_type chr NA Share Type
one_adr_eq dbl NA Number of shares equivilent to one ADR

factset_fund_data.rds

Column Name Column Type Example Content Description
factset_fund_id chr "FFFFFF-E" FactSet identifier for fund
fund_reported_mv dbl 100000000 Total reported Market Value
holding_isin chr "XX0000000002" ISIN held in fund
holding_reported_mv dbl 100000 Market value of ISIN held in fund
report_date date 2023-12-31 report date for holding

factset_isin_to_fund_table.rds

Column Name Column Type Example Content Description
isin chr "XX0000000001" ISIN
fsym_id chr "XXXXXX-S" FactSet identifier for financial instrument
factset_fund_id chr "FFFFFF-E" FactSet identifier for fund

factset_iss_emissions.rds

Column Name Column Type Example Content Description
factset_entity_id chr "XXXXXX-E" FactSet identifier for an entity
icc_total_emissions dbl 123.4 Total emissions for entity
icc_scope_3_emissions dbl 123.4 Scope 3 emissions for entity