QuantConnect / Lean

Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
https://lean.io
Apache License 2.0
8.97k stars 3.14k forks source link

Add user-friendly tools for downloading stocks prices data #766

Closed mushketyk closed 7 years ago

mushketyk commented 7 years ago

By default Lean only provides limited stocks prices data. I've stumbled upon this and according to the discussion forum downloading this data is not straightforward.

It would be beneficial to provide tools for bulk download of stocks prices data. It could, for example, support download all US stocks price data in bulk or download only S&P 500 stocks prices.

In my opinion the README should also be updated to describe how to download the price data.

quietjoy commented 7 years ago

Where would the equities data come from? QuantConnect only provides Cfd and Forex data for download. QuantConnect does not offer equities data that you can download. If you are running a backtest on quantconnect.com, the equities data is provided to your algorithm.

One option is to develop the algos locally and then submit them for backtesting via the QuantConnect Api. Your algos would have access to the equities data QuantConnect offers that way.

QuantConnect also offers a live websocket that provides live forex data and 15min delayed equities data that you can use to simulate live trading locally.

There used to be a class called the ApiDataFileProvider that would automatically download forex and cfd data that wasn't already available on the local file system and that you had purchased through QuantConnect. This interface has been removed because of recent shuffles to the lowest level of Lean, specifically, the IDataProvider and IDataCacheProvider. It's definitely on my list to re-implement the ApiDataFileProvider functionality with the new QuantConnect data stack. You would still only have access to forex and cfd data, but you wouldn't have to predownload the data ahead of time. Instead, Lean would download the data at runtime from QuantConnect. This would mean implementing a new IDataProvider that would reach out to QuantConnect and download the necessary data.

jameschch commented 7 years ago

I'm not sure this requires a very high-tech solution. In terms of downloading say, the entire S&P500, it wouldn't be a big stretch to pull the current tickers from a reliable source, execute the yahoo downloader in a loop and then fire off the coarse universe executable. Did I forget something?

It would also be useful to have a one button update that brings an existing data set up to date, and maybe even creates an update scheduled task.

I've thought seriously about throwing this together, but stopped short because I'm thinking "surely someone has done this before?".

mushketyk commented 7 years ago

Hi @andrewhart098, @jameschch

Thank you for your answers. I am very new to QuantConnect, so maybe I am asking very basic questions.

@andrewhart098

Where would the equities data come from?

There are few downloaders implemented already and I think they should be capable to download stocks data. But there are few issues with them. Build process does not generate "exe" files for them, they cannot download stocks in bulk, you need to provide list of stocks to use them in the first place, etc.

One option is to develop the algos locally and then submit them for backtesting via the QuantConnect Api. Your algos would have access to the equities data QuantConnect offers that way.

I think this significantly limits usability of the local backtesting. Would it be better if users could do backtesting locally at least on the coarse grained data? I am not sure what is your policy regarding the daily prices data, but would it be possible to cache stocks data as they are used by local algorithms?

In an algorithm that I was working on I was trying to access Universe of stocks:

 AddUniverse(Universe.DollarVolume.Percentile(98));

If I understand how it works, this should get both fundamentals for days in question and price data for all the stocks in this universe. Could QuantConnect download it as it is used (if this data is free)?

@jameschch

Did I forget something?

What about fundamentals data? If I understand correctly one need to get all stocks data to properly calculate it and I am not sure if there is a downloader for it.

but stopped short because I'm thinking "surely someone has done this before?"

That's what I thought! If everybody who need to do backtesting locally need to download datasets then there should be dozens of implementations out there. But right now the entry barrier for local backtesting is quite high.

jaredbroad commented 7 years ago

The existing google downloader will take a list of symbols and can download a range of daily data.

Downloading today's S&P500 will be have selection bias as there are many symbols of the S&P500 which have disappeared over time. Coarse universe was invented to avoid selection bias by selecting based on a criteria.

Selection bias is the main reason why even daily data is sold by vendors -- a complete set including historical / delisted / mapped / split adjusted prices is hard to achieve without years of curation. Most vendors charge $1,000+ for a full daily dataset.

A quick and dirty universe could be done by downloading today's symbols; but it should be done with the understanding that it has selection bias and that the backtest will probably perform better than live trading.

jameschch commented 7 years ago

Jared is absolutely right about survivorship bias with a limited set of equities. I can't imagine the set of delisted stocks from a major index is huge, but there would be some research involved if you wished to account for this. Besides, I've come across bad reports about the free yahoo data and think if you're working for anything above academic interest, the investment in quality data is likely to pay off.

You might also consider that the availability of data is one of the reasons many retail traders restrict themselves to forex markets. In many cases, the exchanges will supply their own data to you direct.

mushketyk commented 7 years ago

Hi @jaredbroad, @jameschch

Thank you for your answers.

The existing google downloader will take a list of symbols and can download a range of daily data.

You are right. I managed to use it, but it seems to be more work than it should be (unless I am doing something wrong). To use it I had to go QuantConnect.ToolBox project settings in Visual Studio, set startup object to GoogleDownloader. rebuilt the project. And then executed it in a shell for every stock name in a list that built from another source.

I am not saying it is impossible, it is just a bit cumbersome :)

Downloading today's S&P500 will be have selection bias

Good point. But my idea was to piggy-back on this selection bias :) I read about an algorithm in this book: "Stocks on the Move: Beating the Market with Hedge Fund Momentum Strategies". The strategy is based on trading only S&P 500 stocks. Now I wonder if this a good idea.

Selection bias is the main reason why even daily data is sold by vendors -- a complete set including historical / delisted / mapped / split adjusted prices is hard to achieve without years of curation. Most vendors charge $1,000+ for a full daily dataset.

Do I understand correctly that when I backtrack data on QuantConnect platform the algorithm is tested against an accurate, curated stocks data?

@jameschch

Besides, I've come across bad reports about the free yahoo data and think if you're working for anything above academic interest, the investment in quality data is likely to pay off

Would it make sense to backtest locally on a free data and then verify that an algorithm still demonstrates desirable performance?

You might also consider that the availability of data is one of the reasons many retail traders restrict themselves to forex markets

Do I understand you correctly that free Forex markets data is more accurate?

jaredbroad commented 7 years ago

Do I understand correctly that when I backtrack data on QuantConnect platform the algorithm is tested against an accurate, curated stocks data? Yes

Would it make sense to backtest locally on a free data and then verify that an algorithm still demonstrates desirable performance? Sure! You can use whatever workflow works best for you.

Do I understand you correctly that free Forex markets data is more accurate? No. Accuracy isn't related to availability of data.

Closing ticket but if there's more LEAN related discussion feel free to reopen. For general strategy discussion post your questions to the forums - quantconnect.com/forum