Data mngmt package - Githubissues

Create a shared package that could be installed and then reused by different projects (local research notebook, different instances of v2realbot, scripts etc.) to serve as one point of fetching the data and sharing the cache.

Responsibility of this package:

accessing and managing local trade cache and remotely fetch
accesing and managing local agg cache and execute aggregation including resampling
support for stock, later for crypto

Ideas

trade store (file cache, day per file) - if not present loads from alpaca
agg data store (db or parquet daily files, start with parquet as 5mio parquet is loaded in 3s)
- decide time granularity for agg file cache, ohlcv daily 1sec cbar is 700kb - optmize for this granularity (must be fast). 2 years 1s data (5.5mio) in parquet are loaded in 3s (440days). If it were daily files the overhead of opening 440 files would be immense. Optimize for speed.
- support for various aggregation types
- if not present aggregates from trades with vectorized aggregation and stores to cache
supports resampling (probably only highest resolution is stored)

Exposed IF:

get_trades (symbol, intervals, conditions)
get_agg_data(symbol, interval, agg_type, resolution, trade_conditions)

After installing the package you just configure the stores and access keys and use it within your app - and can use/reuse existing stores.

For stocks daily files always contain also extended hours, they can be filtered by API or by client)

Try to reuse v2trading cache structure - to avoid rework

note existing cache are dictionaries and not df (maybe migration might be necessary)

For speed - optimize remote fetching and loading as suggested in this conversation.

Tasks:

[ ] Develop data package, use (current code)[https://github.com/drew2323/strategy-lab/tree/master/research/data] as inspiration
[ ] Adapt v2trading to use this package (migration of some cache files might be required, or just refetching?)
- all agg data loading (trades for backtest, initial loads, apis that accesses trades and data should be just reusing the new package
- aim is to optimize current inefficient cache loading

Open

DB support

drew2323 / v2trading

Data mngmt package #250

Responsibility of this package:

Ideas

Exposed IF:

Tasks:

Open