finos / community

FINOS Community, Project and SIG wide collaboration space
http://community.finos.org
66 stars 28 forks source link

High Throughput Compute Grid (HTC-Grid) Software Project Contribution and Onboarding #318

Closed kirillsc closed 1 month ago

kirillsc commented 7 months ago

Please note that only FINOS members can propose new projects. If you're interested in membership, see https://www.finos.org/membership-benefits#become-a-member.

Onboarding Process

Completing an onboarding of a project into FINOS requires following these 5 main steps:

  1. Describing the Contribution led by contributor
  2. Approval led by FINOS TOC
  3. Preparing for Onboarding led by contributor
  4. Onboarding completed by FINOS Infra
  5. Announcement led by FINOS Marketing

1. Describing The Contribution

This is a list of questions that need to be answered by the contributor in order to allow a new project to pass to the approval stage of onboarding.

Business Problem

Financial Services Institutions (FSIs) perform a large volume of computations as part of their daily operations, such as pricing their products, monitoring, and controlling their risk exposure. These computations are typically executed on compute grids, which are paramount components of the FSIs' infrastructure today. Historically, compute grids were running on on-premises infrastructure and these systems can be very large, complex and expensive to operate.

Market volatility and new regulatory requirements, such as FRTB and Solvency II, are driving a significant increase in compute load today. Aging on-premises infrastructure is both expensive to retain and slow to scale up. To provide better service to their customers, financial service institutions increasingly desire faster intra-day computational results, on-demand analytics, and innovation. All of these factors prompt FSIs to re-invent their computing grids and explore modernisation pathways in the cloud.

Furthermore, FSI workloads often involve a high number of short-running tasks. It is not uncommon to process more than 100 million tasks per day, with a large portion of these tasks taking less than 2 seconds to compute. These unique characteristics make many existing grid schedulers unsuitable for these workloads.

Proposed Solution

The High Throughput Compute Grid project (HTC-Grid) is a container-based cloud native HPC/Grid solution. The project provides a reference architecture that can be used to build and adapt a modern high throughput compute solution using underlying AWS services, allowing users to submit high volumes of short and long running tasks.

HTC-Grid addresses the massive computational demands by combining very high throughput scheduling (over 30,000+ submissions a second), low latency (under 300 ms) and seamless infrastructure orchestration. HTC-Grid provides uniform architecture suitable to handle large scale nightly batch workloads and intra day workloads that require near real-time response times.

By using a modular architecture made up of managed services such as Amazon EKS, Amazon DynamoDB, Amazon SQS, (or similar services from other CSPs) and the use of Spot instances, HTC-Grid is able to dynamically scale computing resources and meet the demanding requirements of FSIs.

Tentative Roadmap

Initial plan is to move existing HTC-Grid repository from AWS-Labs to FINOS.

Current State

HTC-Grid source code - https://github.com/awslabs/aws-htc-grid HTC-Grid workshop & documentation - https://catalog.us-east-1.prod.workshops.aws/workshops/6b17d1f3-419d-4c05-821f-1dd2c3488d6f

Existing Materials

If materials already exist, provide a link to them that Foundation staff can access - if it's in a private GitHub.com repositories, you should invite the finos-admin user with R/O permissions to those repositories

Development Team

Maintainers

Who will be the project maintainer(s)? Provide full name, affiliation, work email address, and GitHub / GitLab username.

Name Affiliation Work Email Address Github / GitLab username
Clement Rey Amazon clemerey@amazon.fr clementrey-dev
Flamur Gogolli Amazon flamurg@amazon.co.uk fgogolli
Kirill Bogdanov Amazon kirillb@amazon.ch kirillsc

Confirmed contributors

If applicable, list all of the individuals that have expressed interest in and/or are committed to contributing to this project, including full name, affiliation, work email address, and GitHub.com username

Name Affiliation Work Email Address Github / GitLab username
Clement Rey Amazon clemerey@amazon.fr clementrey-dev
Flamur Gogolli Amazon flamurg@amazon.co.uk fgogolli
Kirill Bogdanov Amazon kirillb@amazon.ch kirillsc

Target Contributors

Describe the contributor profile (background, position, organization) you would like to get contributions from.

Project Communication Channel(s)

Understanding FINOS Onboarding Requirements

As a project onboarding into FINOS, you will need to familiarize yourself and your contributor team with the following materials:

Record The Contribution (FINOS Infra)

2. Approval

The FINOS Technical Oversight Committee (TOC) is responsible for approving FINOS project contributions; feel free to check their contribution principles.

If needed, the TOC will request a follow up either via GitHub Issue comments or by inviting project leads to one of their recurrent meetings.

Tasks (for FINOS Infra/TOC)

TOC Findings / Report

TOC to enter findings summary here.

3. Preparing For Onboarding

Before the FINOS team can onboard your project, there are a few housekeeping that need to be taken care of. These must be completed by the contributor, with help if required from the FINOS Infra.

Kick-off meeting

Logo / Trademarks

FINOS Project Blueprint

Add documentation here

4. FINOS Onboarding

This is performed by FINOS Infra once the three previous stages are complete, with support from the contributor and the FINOS Infra team.

Maintainers, Contributors and CLAs

Validation (only if code is contributed)

@kirillsc - All dependencies in requirements.txt are unpinned; is there any reason for that? If not, we'd recommend to follow Python best practices and pin dependencies.

python3 -m venv .
./bin/pip install pip-licenses
./bin/pip-licenses --allow-only="ISC License (ISCL); BSD License; Apache Software License; MIT License; Python Software Foundation License; Mozilla Public License 2.0 (MPL 2.0)"

No Category X licenses found

@kirillsc - it would be great to create a GitHub Action to run this continuously, every time the requirements.txt file gets updated. Is this something you and team could work on?

Code transfer

Project Communication Channel(s)

Repository setup

5. Announcement

(Lead: Project Lead and FINOS Infra team)

maoo commented 7 months ago

@kirillsc - thanks for your contribution proposal! I'm going to guide you and team through the contribution process.

As next step, we'll organize a meeting with you and the team, to give you a high level walkthrough of the process and refine the checklist mentioned above.

I'll also engage with our Technical Steering Committee to ask for their approval. Thank you!

brunodom commented 6 months ago

@kirillsc - thanks for the contribution proposal. I see merit in this approach and what is not clear yet is the maturity / fit of this solution. Do you know any FI that is actually using it in production environment? Knowing this help steer the roadmap.

Thanks you!

eminty69 commented 5 months ago

Approved, but please make sure the roadmap is documented. I know there are plans for the project and to extend support to the other providers but this needs to be incorporated into the roadmap

kirillsc commented 5 months ago

Great news! I look forward to working together on this project!

TheJuanAndOnly99 commented 2 months ago

Announcement sent, see https://groups.google.com/a/finos.org/g/announce/c/b7tAlU5rmBI.

The contribution process is completed (there are few leftovers that are in progress but they are not blockers).

Congratulations to the Amazon team, and thanks for contributing HTC Grid to FINOS!

mindthegab commented 1 month ago

Should this issue be closed @TheJuanAndOnly99 or anything left to do?

TheJuanAndOnly99 commented 1 month ago

@mindthegab Yes. Closing issue. There are still a few tasks that need to be done on our end that I'll keep track of internally.