martin-traverse commented 3 years ago

Business Problem

In many institutions, the infrastructure used to manage complex and highly regulated models is inefficient, expensive to maintain and in need of modernisation.
Rapid regulatory change has led to tactical solutions, manual processes and technical debt
Cloud, big data and data science technologies offer a range of options for upgrading legacy modelling platforms, but using these solutions to manage highly governed models is not easy

In more detail...

Common challenges faced in risk, finance and regulatory reporting in dealing with the model landscape.

Reliance on expensive vendor products, not open source tools
Significant effort maintaining production and non-production environments
Material effort on data sourcing, cleaning & reconciling data to production
Technical constraints on infrastructure and tooling in pre-production
Highly manual and document-heavy model approval process
High cost/time to ‘deploy’ a model into production
Significant manual interventions required in production reporting cycles
Time consuming manual processes to evidence controls and lineage
High cost/time required to make changes in production
Constrained ability to run what-if analysis and challenger models
Granularity/sophistication of analytics constrained by infrastructure
Significant replication of effort and code to validate and monitor models

Proposed Solution

TRAC is a new type of analytics solution, built for managing complex, highly governed models
It simplifies and automates processes by providing total repeatability, audit and control (i.e. TRAC) over the full workflow (data, models, overlays, sign-off)
The solution is designed to work with major cloud providers and Hadoop. It supports several model languages (e.g. Python, R, Scala, SQL) and developer toolkits

In more detail...

TRAC is built around a structured metadata model that records and catalogs every asset and traceable action known to the TRAC platform. It consists of two layers:

Objects are the structural element of the model, they represent assets and actions. Data, models and jobs are all described by metadata objects. Each type of object has a metadata structure that is defined as part of the TRAC API.
Tags are used to index, describe and control objects, they are made up of key-value attributes. Some attributes are controlled by the platform, others can be set by client applications or edited by users.

Both objects and tags are versioned with an immutable, time-indexed version history, “updates” are performed by creating a new version of the object or tag with the required changes. Because of this, the TRAC metadata provides a fully consistent historical view of the platform for any previous point in time. It also provides a complete audit history that is both machine and human readable, with no manual effort.

Where objects refer to external resources such as models and data, those resources are also immutable. This is achieved using e.g. GitHub tags or Nexus binary versions for models, and data areas owned by TRAC with controlled write access for primary data. The combination of immutable metadata and immutable resources allows TRAC to recreate any previous calculation that has run on the platform. As a result, generated and intermediate data can often be discarded and recreated later if needed.

The other key element of TRAC is platform abstraction. Data is stored in common formats (ORC, Parquet, Avro etc.) and models are written using common frameworks (primarily Spark, also the Pandas stack in Python). TRAC abstracts the execution environment and provides models with their required resources at runtime, so the same model code can run on any of the big three cloud platforms, or Hadoop, or in a local sandbox. In the same way, a data API provides capabilities for aggregation and other common analytics by abstracting the data technologies of the underlying platform. A plugin architecture allows data and execution capabilities to be added for other platforms or to integrate with existing in-house solutions.

Read more about TRAC here: https://trac-platform.readthedocs.io/en/stable/index.html

Tentative Roadmap

A high level development roadmap for the open source version of TRAC is available here: https://github.com/Accenture/trac/wiki/Development-Roadmap

Current State

TRAC was conceived in 2019 and has been implemented in a major UK Bank, initially for credit forecasting and now being rolled out across IFRS9 and prudential modelling
A free to use open source version of TRAC is now being developed and released
Accenture legal have approved the release of the open source version

Existing Materials

GitHub: https://github.com/accenture/trac Read the docs: https://trac-platform.readthedocs.io/ PyPI (model runtime package): https://pypi.org/project/trac-runtime/ NPM JS (web API package): https://www.npmjs.com/package/trac-web-api

Development Team

Leadership

Architect / lead developer: Martin Traverse Affiliation: Accenture Email: martin.traverse@accenture.com GitHub: martin-traverse

Confirmed contributors

The closed-source version of TRAC was built on a commercial basis. The engine was developed by a core team of three developers, with a broader team providing support. The applications that ran on the platform were developed by the client.

Target Contributors

The TRAC platform services are written in Java, built directly on top of Netty, gRPC/protobuf and Apache Arrow, as well as the underlying platform technologies for storage and execution. The runtime components and API packages are written in their respective languages. Communication between components is via protobuf. Code generation is done using protoc plugins to create domain objects, rest mappings, validation and documentation.

Our target contributors are people who think these technologies sound like fun! One model would be some part-time involvement from a small number (possibly 1) of people at organizations using the tool. Long term support model is a key item to discuss as the project gains momentum.

Infrastructure needs

Describe the FINOS infrastructure you will need for this project, in addition to a GitHub repository. The FINOS team will connect with you before setting up any of this infrastructure

Recurring meetings - not at the moment, team is internal
Mailing list - not for now; we'll stick with github issues/discussions for now

What's next?

Upon submission of this project proposal, the FINOS team will get in touch with you to discuss next steps.

Contribution process (v. 1.0, last updated on October 2, 2020)

Below is the list of tasks that FINOS Team and the contribution author go through in order to complete the FINOS contribution process. Please do not edit these contents at contribution time!

FINOS Contrib POC

[x] Identify and Assign FINOS Project Coordinator - Juan Estrella
[x] Identify and Assign FINOS Strategic Advisor - Jane Gavronsky

Kick-off meeting

[x] Set up kick-off meeting with project leads to review:
- [x] FINOS overview (if necessary)
- [x] FINOS Maintainers cheatsheet
[x] Discuss project proposal

Proposal (Lead Maintainer)

[x] Lead maintainer to send out announcement to community@finos.org using this template:

Identify project meta (Lead: FINOS Contrib POC, Support: FINOS Marketing)

[x] Project Name
- [x] Assess current trademark status
- [x] Define new project name (if applicable) - TRAC d.a.p.
- [x] Design new project logo (if applicable) - mao and Martin, working with Megan on it
[x] Category and sub-category (for FINOS Landscape) - Data & Business Logic - Frameworks - https://landscape.finos.org/card-mode?category=frameworks&grouping=category
[x] Existing code or new Github repository - https://github.com/accenture/trac
[x] Existing code releases (and which artifact repositories are used) - https://pypi.org/project/trac-runtime/ and https://www.npmjs.com/package/trac-web-api (still missing core platform, Maven release TBD)
[x] Team composition: lead maintainer and other maintainers - Martin - new dev will onboard, focussed on tooling
[x] Meetings (existing/yes/no) - Not yet
[x] Continuous Integration (existing/yes/no) - GitHub Actions
[x] Documentation website (existing/yes/no) - https://trac-platform.readthedocs.io/
[x] Define project slug - trac

Maintainers, contributors and CLAs (Lead: FINOS Contrib POC, Support: FINOS infra)

[x] For each maintainer identified in the previous step, collect: the following info:
- Fullname
- GitHub username
- Corporate email address
[x] Identify other existing contributors (assuming there's a contribution history (eg Git history)
[x] Check if maintainers and other contributors are all covered by FINOS CLA
[x] Martin to add himself to EasyCLA

Project Communication Channel(s)

[x] Ask maintainers which communications channels they'd like to use
Asynchronous
- [x] GitHub Issues
- GitHub Discussions (future)
- GitHub Team Discussions (future)
Synchronous
- FINOS Slack Channel (future/opportunistic)
[x] Link communication channels linked front and center in the project README.md (@TheJuanAndOnly99 sending PR with small changes to README and similar)

Code validation (only if code is contributed) (Lead: FINOS Infra)

The codebase doesn’t have HIGH or CRITICAL CVEs across direct and transitive libraries ; The codebase doesn’t have any unfriendly licenses across direct and transitive libraries
[x] (optional - if a build system is provided) The build process runs successfully
[x] Martin to reach out in Accenture and find a GitHub Org admin; plan (for February) a code transfer meeting
[x] The codebase doesn’t include any patent or copyright that conflicts with FINOS Governance and bylaws (to be validated with FINOS Legal team) - grep -Ri "all rights reserved" *
[x] Apply project blueprint contents - see ODP docs - @TheJuanAndOnly99 to send PR
- [x] Ensure that the proper project governance is in the CONTRIBUTING.md file
[x] All incubating criteria are checked and documented below

Approval (Lead: FINOS Infra)

[x] Assign issue to Executive Director (@mindthegab) to trigger voting
[x] FINOS accepts the contribution (and the contribution process can move forward)

Code transfer (Lead: FINOS Infra)

[x] Check GitHub repository transfer requirements:
- [x] finos-admin has Admin to all repositories to transfer
- [x] finos-admin ia allowed to transfer repositories out of the org
- [x] if the repository is owned by a user (and not an org), the user must be able to transfer the repository to finos-admin
[x] Transfer all code assets as GitHub repositories under github.com/finos
[x] Invite GitHub usernames to GitHub FINOS Org
[x] Create <project-name>-maintainers GitHub team and invite users
[x] Configure finos-admins and finos-staff team permissions
[x] Enable EasyCLA / CLA Bot

Infra setup (Lead: FINOS Infra)

Update release coordinates and code namespace to include finos - work in progress; will be ready prior to first release under FINOS
[x] Update project badge
[x] Update project README
[x] Enable security vulnerabilities scanning - see https://github.com/finos/tracdap/blob/main/.github/workflows/compliance.yml

Metadata update (Lead: FINOS Infra)

[x] Add project to metadata
[x] Add identities, orgs and affiliations to metadata
165
[x] Add project maintainers GitHub usernames to the project-maintainers Team
Onboard project on LF systems (SFDC, Insights, EasyCLA, Groups.io) - LF ticket

Announcement (Lead: FINOS Contrib POC)

[x] Work with FINOS marketing to send out announcement to announce@finos.org , checkout announcement template at https://github.com/finos/community/blob/master/governance/Software-Projects/Contribution.md#step-5-contribution-announcements
[x] Notify FINOS Contrib POC and FINOS marketing manager once the announcement has been sent out (FINOS infra)

mcleo-d commented 3 years ago

Thanks so much for raising this contribution @martin-traverse 👏🏻 💯💯💯

I look forward to engaging and working with you as we socialise with the FINOS community.

James 🚀

mindthegab commented 3 years ago

Thanks for raising this @martin-traverse - great to see Accenture put forward its first contribution.

My immediate question would be if you had a chance to take a look at Legend and would be able to discuss if there’s any overlap between the two projects, in terms of scope and business goals, or whether you can see a potential for integrating the two technologies.

Tagging the @finos/legend-maintainers, as I’m sure that would start a good convo (likely over my head :P)

Thanks!

martin-traverse commented 3 years ago

Hello! Thanks for the question - I heard about Legend for the first time at the conference and went to some talks about it, because I had precisely the same question myself. Based on what I’ve understood I think they do different things, although they are in a similar space - I guess this is a chance to test that understanding! Re. integration yes I think there is potential although it requires features from both projects that are in the roadmap but not yet published. Let me outline some thoughts to see if this all makes sense.

Legend

So to check my understanding. Legend is a data modelling tool that allows you to create strict, business-friendly definitions of relational data models, including data validation rules, which are both human- and machine readable. Data models can be ‘pure’ or they can be mapped to underlying relational sources, which can be a heterogeneous collection, in which case Legend allows the data model to be queried and translates those queries to the underlying sources. It handles schema evolution for the data model and has a selection of user-friendly tools for creating, editing, querying and sharing data models. It also functions as a data catalog.

@finos/legend-maintainers - have I’ve understood the core functionality and what key bits have I missed? Hopefully I’m at least barking up the right ball park!

TRAC

TRAC is primarily a model management and execution platform. It knows about models, data, execution runs and sign-off. Core capabilities are: repeatability - any model run can be repeated and will reproduce the same results, a point-in-time historical view is available across the whole platform. Audit - every asset and action on the platform is audited in a machine- and human- readable form (the audit log and platform instructions are built from the same metadata model, so they are guaranteed to be in sync). Control - a flexible policy model governs the sign-off process and controls who can do what, policy and sign-off is also audited and available as a point-in-time view. Flexibility - modellers and business users are free to upload and run new models and/or data at any time without change risk, due to the immutability guarantee. It also functions as a model/data catalogue.

Our primary focus to date has been risk/finance models, which have high regulatory requirements around audit and lineage at the same time as requiring ad-hoc analysis and flexible judgement/overlays.

Overlaps

Both platforms are “facades” over existing data infrastructure. The functionality they expose is quite different though. Legend is about describing the data model and relations. TRAC has no concept of relations. It does have a simple concept of schema, but it’s intentionally minimal and really focused on what is needed to run models. Legend can provide a real-time view of a data in multiple external systems. TRAC can connect to external systems as sources, but it’s really about what you run on the platform. Both systems provide cataloguing capabilities - I think they’ll be cataloguing different things though.

Integration

The obvious one is that TRAC could use legend as a data source/repository. We have the framework already to plug in data sources and push down queries. The Legend data model is obviously a lot stronger than most sources which would be very helpful for integration. Organisations that implement Legend for data modelling/management would find it very easy to use TRAC for analytics, with one pre-packaged integration instead of several bespoke ones.

Generally we treat external data as mutable. Per my understanding, Legend has the concept of temporality and can express bi-temporal data, so we could treat Legend data as immutable where sources support it and that would really help us. I believe APIs are available for this in open source legend, but the supporting implementation is not published yet.

Ok apologies that got quite long - I guess there’s quite a lot to say. Very interested to hear from the legend team - does this make sense to you or am I wide of the mark? Either way it’s best to know early on!

maoo commented 3 years ago

@martin-traverse - thanks for the chat today. Here's what we agreed to be the first tasks:

[x] Reach out Accenture legal team to
- [x] Work on FINOS CLA - see https://easycla.lfx.linuxfoundation.org/ (Martin)
- [x] Confirm that Trac (name and code) don't have any trademark nor patent (if so, during the contribution, they'd have to be transferred to the Linux Foundation)
[x] Open GitHub issue to create CVE and license scanning scripts for all components and builds (mao)
[x] Tag issues with Good First Issue, so that FINOS can share them with the community (Martin)

Next meeting (and update) is on Friday 22 October. Thank you!

mindthegab commented 2 years ago

Given the extensive socialization process, happy to approve the contribution to move forward. Thanks @martin-traverse!

martin-traverse commented 2 years ago

@mindthegab - this is such great news, myself and the TRAC team are very excited and happy to be coming on board :-)

I will continue working with @TheJuanAndOnly99 to complete the process.

TheJuanAndOnly99 commented 2 years ago

@maoo. Code transfer scheduled for Tuesday March 8th 2022