algotradingsoc / data_infrastructure

Research team for data infrastructure team.
Apache License 2.0
0 stars 0 forks source link

data_infrastructure

Overview

Managing the data infrastrcture and computing resources of Imperial Algosoc
Database: price and fundamentals data of US Equities and other asset classes
Cloud computing: deploy machine learning models on cloud computing services to generate live predictions

Resources and skills invovled

Python: Pandas, Dask
Database: MongoDB
Computing: Docker, Linux

Data Sources

Kaggle US Equities Data (1992-2019)
Kaggle US reported financial Data (2010-2020)
Quandl Financial Industry Regulatory Authority (2013-2020)

https://www.kaggle.com/finnhub/reported-financials
https://www.kaggle.com/finnhub/end-of-day-us-market-data-survivorship-bias-free
https://www.quandl.com/data/FINRA-Financial-Industry-Regulatory-Authority

Calendar for US Equities Installation

pip install pandas_market_calendars

https://github.com/rsheftel/pandas_market_calendars

Project stages

Stage 1: Historical database

Design database schema for storing daily US Equity and fundamentals data

Build metadata for database

Build functions for feature engineering

Build API for database access

Stage 2: Cloud services

Process live data

Deploy database on cloud services

Deploy machine learning models