Closed qtbgo closed 1 year ago
Hello @qtbgo,
PyBroker offers two ways to support the requirement you mentioned:
If your DataFrame contains data only for current S&P 500 constituents, you can enable the feature of automatically selling any position in an S&P 500 company on the last bar of data, assuming it leaves the index after the last bar. This can be done by setting StrategyConfig#exit_on_last_bar to True.
Alternatively, you can maintain a mapping of S&P 500 constituents for every year. In your execution function, you can check the current year using ExecContext#dt and verify that ExecContext#symbol is in the set of current constituents. This will enable you to skip or manually sell any position in a company that leaves the S&P 500.
I hope this clears things up. Let me know if you have any further questions.
I see strategy.add_execution( exec_fn, ['AAPL', 'MSFT'], indicators=highest('high_10d', 'close', period=10)). Does this mean I need to add all symbols in history sp500s initialy? If so, I'am afraid the memory cosumption is too high.
Hi @quant2008,
You should include all symbols in your universe, but only load historical data into your DataFrame for each S&P 500 member for every year, not all historical data for all historical S&P constituents. Memory usage will depend on historical data resolution, time span, indicators computed, and available RAM. Please let me know if you encounter any issues and what you're trying to accomplish.
@quant2008
If you're unable to load all historical data into a DataFrame at once, consider using a library like Dask: https://pandas.pydata.org/docs/user_guide/scale.html#use-other-libraries
My recommendation is to then filter your universe before backtesting with PyBroker, and only include stocks that meet your trading criteria. Backtesting stocks that will never end up in your portfolio is computationally wasteful, especially if you plan to compute indicators and/or train models for every stock in your universe.
you say “You should include all symbols in your universe, but only load historical data into your DataFrame for each S&P 500 member for every year, not all historical data for all historical S&P constituents.“ This is exactly what I want. Suppose I have a strategy reblancing every year. I add all symbols in all historical sp500 universe using strategy.add_execution. I had thought that this would load all data into platform. But you say can only load historical data into DataFrame for each S&P 500 member for every year. I wonder how. I checked the example and didn't find how.
To achieve this, you can create a custom DataFrame that includes only the S&P 500 constituents for each year of your historical data. For guidance on how to use such a DataFrame, please refer to the Creating a Custom DataSource notebook.
I check the example in Creating a Custom DataSource notebook. And it seems it still loads all data once, not seprately.
The example in that notebook demonstrates how to load data from a CSV file using Pandas. The process of loading that data is independent of PyBroker. If you are loading a large amount of data that cannot fit into memory, then refer to https://github.com/edtechre/pybroker/issues/11#issuecomment-1518921421. How you load your own data into a DataFrame has nothing to do with PyBroker. You need to address that problem yourself, and filter only the relevant data before backtesting, as explained in https://github.com/edtechre/pybroker/issues/11#issuecomment-1518921421.
If you are instead using the included Yahoo Finance or Alpaca data providers, those services will limit the number of symbols you can request at once, which will prevent you from downloading data for all 500 stocks at the same time. That has nothing to do with memory usage.
does pybroker support dynamic universe. for example, sp500 universe varies in different years, can this be support in pybroker?