Closed martin-traverse closed 2 years ago
Thanks so much for raising this contribution @martin-traverse 👏🏻 💯💯💯
I look forward to engaging and working with you as we socialise with the FINOS community.
James 🚀
Thanks for raising this @martin-traverse - great to see Accenture put forward its first contribution.
My immediate question would be if you had a chance to take a look at Legend and would be able to discuss if there’s any overlap between the two projects, in terms of scope and business goals, or whether you can see a potential for integrating the two technologies.
Tagging the @finos/legend-maintainers, as I’m sure that would start a good convo (likely over my head :P)
Thanks!
Hello! Thanks for the question - I heard about Legend for the first time at the conference and went to some talks about it, because I had precisely the same question myself. Based on what I’ve understood I think they do different things, although they are in a similar space - I guess this is a chance to test that understanding! Re. integration yes I think there is potential although it requires features from both projects that are in the roadmap but not yet published. Let me outline some thoughts to see if this all makes sense.
So to check my understanding. Legend is a data modelling tool that allows you to create strict, business-friendly definitions of relational data models, including data validation rules, which are both human- and machine readable. Data models can be ‘pure’ or they can be mapped to underlying relational sources, which can be a heterogeneous collection, in which case Legend allows the data model to be queried and translates those queries to the underlying sources. It handles schema evolution for the data model and has a selection of user-friendly tools for creating, editing, querying and sharing data models. It also functions as a data catalog.
@finos/legend-maintainers - have I’ve understood the core functionality and what key bits have I missed? Hopefully I’m at least barking up the right ball park!
TRAC is primarily a model management and execution platform. It knows about models, data, execution runs and sign-off. Core capabilities are: repeatability - any model run can be repeated and will reproduce the same results, a point-in-time historical view is available across the whole platform. Audit - every asset and action on the platform is audited in a machine- and human- readable form (the audit log and platform instructions are built from the same metadata model, so they are guaranteed to be in sync). Control - a flexible policy model governs the sign-off process and controls who can do what, policy and sign-off is also audited and available as a point-in-time view. Flexibility - modellers and business users are free to upload and run new models and/or data at any time without change risk, due to the immutability guarantee. It also functions as a model/data catalogue.
Our primary focus to date has been risk/finance models, which have high regulatory requirements around audit and lineage at the same time as requiring ad-hoc analysis and flexible judgement/overlays.
Both platforms are “facades” over existing data infrastructure. The functionality they expose is quite different though. Legend is about describing the data model and relations. TRAC has no concept of relations. It does have a simple concept of schema, but it’s intentionally minimal and really focused on what is needed to run models. Legend can provide a real-time view of a data in multiple external systems. TRAC can connect to external systems as sources, but it’s really about what you run on the platform. Both systems provide cataloguing capabilities - I think they’ll be cataloguing different things though.
The obvious one is that TRAC could use legend as a data source/repository. We have the framework already to plug in data sources and push down queries. The Legend data model is obviously a lot stronger than most sources which would be very helpful for integration. Organisations that implement Legend for data modelling/management would find it very easy to use TRAC for analytics, with one pre-packaged integration instead of several bespoke ones.
Generally we treat external data as mutable. Per my understanding, Legend has the concept of temporality and can express bi-temporal data, so we could treat Legend data as immutable where sources support it and that would really help us. I believe APIs are available for this in open source legend, but the supporting implementation is not published yet.
Ok apologies that got quite long - I guess there’s quite a lot to say. Very interested to hear from the legend team - does this make sense to you or am I wide of the mark? Either way it’s best to know early on!
@martin-traverse - thanks for the chat today. Here's what we agreed to be the first tasks:
Good First Issue
, so that FINOS can share them with the community (Martin)Next meeting (and update) is on Friday 22 October. Thank you!
Given the extensive socialization process, happy to approve the contribution to move forward. Thanks @martin-traverse!
@mindthegab - this is such great news, myself and the TRAC team are very excited and happy to be coming on board :-)
I will continue working with @TheJuanAndOnly99 to complete the process.
@maoo. Code transfer scheduled for Tuesday March 8th 2022
The contribution is now complete. Thank you @martin-traverse + team!
Business Problem
In many institutions, the infrastructure used to manage complex and highly regulated models is inefficient, expensive to maintain and in need of modernisation.
Rapid regulatory change has led to tactical solutions, manual processes and technical debt
Cloud, big data and data science technologies offer a range of options for upgrading legacy modelling platforms, but using these solutions to manage highly governed models is not easy
In more detail...
Common challenges faced in risk, finance and regulatory reporting in dealing with the model landscape.
Proposed Solution
TRAC is a new type of analytics solution, built for managing complex, highly governed models
It simplifies and automates processes by providing total repeatability, audit and control (i.e. TRAC) over the full workflow (data, models, overlays, sign-off)
The solution is designed to work with major cloud providers and Hadoop. It supports several model languages (e.g. Python, R, Scala, SQL) and developer toolkits
In more detail...
TRAC is built around a structured metadata model that records and catalogs every asset and traceable action known to the TRAC platform. It consists of two layers:
Objects are the structural element of the model, they represent assets and actions. Data, models and jobs are all described by metadata objects. Each type of object has a metadata structure that is defined as part of the TRAC API.
Tags are used to index, describe and control objects, they are made up of key-value attributes. Some attributes are controlled by the platform, others can be set by client applications or edited by users.
Both objects and tags are versioned with an immutable, time-indexed version history, “updates” are performed by creating a new version of the object or tag with the required changes. Because of this, the TRAC metadata provides a fully consistent historical view of the platform for any previous point in time. It also provides a complete audit history that is both machine and human readable, with no manual effort.
Where objects refer to external resources such as models and data, those resources are also immutable. This is achieved using e.g. GitHub tags or Nexus binary versions for models, and data areas owned by TRAC with controlled write access for primary data. The combination of immutable metadata and immutable resources allows TRAC to recreate any previous calculation that has run on the platform. As a result, generated and intermediate data can often be discarded and recreated later if needed.
The other key element of TRAC is platform abstraction. Data is stored in common formats (ORC, Parquet, Avro etc.) and models are written using common frameworks (primarily Spark, also the Pandas stack in Python). TRAC abstracts the execution environment and provides models with their required resources at runtime, so the same model code can run on any of the big three cloud platforms, or Hadoop, or in a local sandbox. In the same way, a data API provides capabilities for aggregation and other common analytics by abstracting the data technologies of the underlying platform. A plugin architecture allows data and execution capabilities to be added for other platforms or to integrate with existing in-house solutions.
Read more about TRAC here: https://trac-platform.readthedocs.io/en/stable/index.html
Tentative Roadmap
A high level development roadmap for the open source version of TRAC is available here: https://github.com/Accenture/trac/wiki/Development-Roadmap
Current State
TRAC was conceived in 2019 and has been implemented in a major UK Bank, initially for credit forecasting and now being rolled out across IFRS9 and prudential modelling
A free to use open source version of TRAC is now being developed and released
Accenture legal have approved the release of the open source version
Existing Materials
GitHub: https://github.com/accenture/trac Read the docs: https://trac-platform.readthedocs.io/ PyPI (model runtime package): https://pypi.org/project/trac-runtime/ NPM JS (web API package): https://www.npmjs.com/package/trac-web-api
Development Team
Leadership
Architect / lead developer: Martin Traverse Affiliation: Accenture Email: martin.traverse@accenture.com GitHub: martin-traverse
Confirmed contributors
The closed-source version of TRAC was built on a commercial basis. The engine was developed by a core team of three developers, with a broader team providing support. The applications that ran on the platform were developed by the client.
Target Contributors
The TRAC platform services are written in Java, built directly on top of Netty, gRPC/protobuf and Apache Arrow, as well as the underlying platform technologies for storage and execution. The runtime components and API packages are written in their respective languages. Communication between components is via protobuf. Code generation is done using protoc plugins to create domain objects, rest mappings, validation and documentation.
Our target contributors are people who think these technologies sound like fun! One model would be some part-time involvement from a small number (possibly 1) of people at organizations using the tool. Long term support model is a key item to discuss as the project gains momentum.
Infrastructure needs
Describe the FINOS infrastructure you will need for this project, in addition to a GitHub repository. The FINOS team will connect with you before setting up any of this infrastructure
What's next?
Upon submission of this project proposal, the FINOS team will get in touch with you to discuss next steps.
Contribution process (v. 1.0, last updated on October 2, 2020)
Below is the list of tasks that FINOS Team and the contribution author go through in order to complete the FINOS contribution process. Please do not edit these contents at contribution time!
FINOS Contrib POC
Juan Estrella
Jane Gavronsky
Kick-off meeting
Proposal (Lead Maintainer)
Identify project meta (Lead: FINOS Contrib POC, Support: FINOS Marketing)
TRAC d.a.p.
mao and Martin, working with Megan on it
Data & Business Logic - Frameworks
- https://landscape.finos.org/card-mode?category=frameworks&grouping=categoryMartin
- new dev will onboard, focussed on toolingtrac
Maintainers, contributors and CLAs (Lead: FINOS Contrib POC, Support: FINOS infra)
Project Communication Channel(s)
Code validation (only if code is contributed) (Lead: FINOS Infra)
grep -Ri "all rights reserved" *
Approval (Lead: FINOS Infra)
Code transfer (Lead: FINOS Infra)
Admin
to all repositories to transfer<project-name>-maintainers
GitHub team and invite usersfinos-admins
andfinos-staff
team permissionsInfra setup (Lead: FINOS Infra)
finos
- work in progress; will be ready prior to first release under FINOSMetadata update (Lead: FINOS Infra)
165
Announcement (Lead: FINOS Contrib POC)