QuantConnect / Lean

Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
https://lean.io
Apache License 2.0
9.43k stars 3.2k forks source link

DatabaseDataFeed and DatabaseDataProvider #4693

Closed patmetabox closed 3 years ago

patmetabox commented 4 years ago

Question

I have been looking to integrate different data source from a database and have been looking through the code and trying to determine the best method to integrate this into the existing Engine.

I have found that there isnt much documentation regarding the MapFileProvider, CacheProvider, DataFeed, Resolvers etc and am hence asking a question to see if any in-depth engine documentation is available

I would like to create a DatabaseDataFeed and DatabaseDataProvider which can return data from a local database which uses timescale queries to produce any requested resolution for data that I have available.

I am looking to see where there might be information regarding the internal plumbing that allows someone new to the platform to work on adding new features

I see there was a DatabaseDataFeed but this was only present for a few months many years ago

https://github.com/QuantConnect/Lean/commit/b97afdb09f37ec25960b0455402d8a71cc647f5b

I am happy to develop this and create a pull request to incorporate it into the platform but would need some assistance finding the relevant information about how the engine works as it appears it is only setup to read from disk from my initial investigations

Many thanks

pomeara commented 4 years ago

@Martin-Molinero do you know if there is any documentation which covers the IDataFeed as this appears to be the old version before the engine was upgraded

https://www.quantconnect.com/lean/documentation/topic30455.html

Any information about how to integrate a database data provider would be much apprieciated

Martin-Molinero commented 4 years ago

Hey! Sorry for the outdated docs we try to keep up, the code itself is quite documented/commented. Lean allows specifying by config custom implementations. For backtesting, I believe you could get it going pretty quickly with a custom IDataProvider to source the raw data from a DB. Lean should use the same IDataProvider instance. To which DB do you want to connect?

pomeara commented 4 years ago

Hi @Martin-Molinero,

Thanks for your reply.

I'm connecting to a postgres database and I've been finding that there is a lot of expectations in the code where it wants to look in .zip files - even down into the trading objects such as TradeBar and Tick objects, this expectation of using .zip files is quite heavily embedded in the code rather than using appropriate datareaders/subscribers so I'm having some fun slotting in a database layer and gaining an understanding of which subscribers, cache providers, etc are required

image

I have my own data and am looking to wire the engine upto this to use my local data vs having to download data into directories zipped up based on different resolutions (1m, 15m, 1h, etc)

pomeara commented 4 years ago

@Martin-Molinero are there any examples of such an implemnation ? I've been spending quite some time on this and I keep going in circles and getting confused by the numerous layers and enumerators and am not making much progress

Any change of adding a simple example for this use case to help the community develop their own various db implementations

Martin-Molinero commented 3 years ago

Hey!

expectation of using .zip files is quite heavily embedded in the code

I'd suggest as a first step, accept the file-based source path string input and handle the translation to a db query internally.

are there any examples of such an implemnation ?

Not that I know off really.

Sadly Lean isn't using a config to determine the IDataCacheProvider type to use. You will probably have to replace the ZipDataCacheProvider instance being used by a simple passthrough to your IDataProvider.

borseno commented 3 years ago

Hi @Martin-Molinero

I did as you said very nice suggestion all works. But - sometimes Lean generates this kind of path {market}\hour\btcust_trade - could you please advice here - what date should it be? I mean start date end date for the history request. As it is not specified in the path. I assume today's date? Or it should be null?

Martin-Molinero commented 3 years ago

Hey! For low resolution hour/daily data is organized in a single entry (not many dpts), that's why you see those kinds of paths, it's expecting to receive all the data, note that Lean should cache these data points at the TextSubscriptionDataSourceReader.

I assume today's date? Or it should be null?

It's expecting all data points, at a higher level in the stack the start/end date will filter the data

I did as you said very nice suggestion all works.

Glad it worked! 😃