Apologies in advance for the detail, but thought it would be important to give feedback, as a result of several days work on getting https://github.com/AI4Finance-Foundation/FinRL/blob/master/tutorials/3-Practical/FinRL_PaperTrading_Demo.ipynb integrated with my broker to test it out.

I was aiming to build a data_processor specifically for my UK based broker (ig.com). However, I found it only allows 10,000 data points per week(!) across all tickers (i.e. very restricted). So, I am have been working on using yahoofinance for data collection and my broker for trading operations. Eventually, if I get all this working, I will look to new implement a new data processor for IG, but I will need to cache data locally.

Therefore, I was hoping to use the data_processor as a template, use alpaca data processor as an example implementation (as it is utilised by the paper trading code), and modify yahoofinance to download UK stocks (and trading times).

Meta’s architecture is laudable as it aims to abstract into layers, but the current implementation I have found to be understandably brittle due to it being focussed on USA localization – there is a big opportunity to make it more generic for a global audience. The timezones, exchanges, trading hours etc are all fixed to NYSE, and therefore the code has to be carefully inspected to a) understand (the codes could benefit from peer review and commenting), and b) adapt to other country data provider and broker requirements. There is also an opportunity to refactor to support a different data provider whilst using a separate broker for trading operations.

The FinRL paper trading demo uses Alpaca data processor, so I have been comparing that with the yahoofinance data processor to ensure every requirement has been met.

In the template file https://github.com/AI4Finance-Foundation/FinRL/blob/master/finrl/meta/data_processor.py class DataProcessor has seven methods: def init(self, data_source, **kwargs): def download_data(self, ticker_list, start_date, end_date, time_interval) -> pd.DataFrame: def clean_data(self, df) -> pd.DataFrame: def add_technical_indicator(self, df, tech_indicator_list) -> pd.DataFrame: def add_turbulence(self, df) -> pd.DataFrame: def add_vix(self, df) -> pd.DataFrame: def df_to_array(self, df, if_vix) -> np.array:

However, each the alpaca data processor implementation adds three extra methods. In https://github.com/AI4Finance-Foundation/FinRL/blob/master/finrl/meta/data_processors/processor_alpaca.py class AlpacaProcessor has methods: def init(self, API_KEY=None, API_SECRET=None, API_BASE_URL=None, api=None): def download_data(self, ticker_list, start_date, end_date, time_interval) -> pd.DataFrame: def clean_data(self, df): def add_technical_indicator(self, df, tech_indicator_list=["macd", "boll_ub", "boll_lb", "rsi_30", "dx_30", "close_30_sma", "close_60_sma"]): def add_vix(self, data): def calculate_turbulence(self, data, time_period=252): def add_turbulence(self, data, time_period=252): def df_to_array(self, df, tech_indicator_list, if_vix): def get_trading_days(self, start, end): def fetch_latest_data(self, ticker_list, time_interval, tech_indicator_list, limit=100) -> pd.DataFrame: **** Note: returns three arrays not a dataframe

However, yahoofinance is different again: https://github.com/AI4Finance-Foundation/FinRL/blob/master/finrl/meta/data_processors/processor_yahoofinance.py class YahooFinanceProcessor: def init(self): def download_data(self, start_date: str, end_date: str, ticker_list: list, time_interval: str) -> pd.DataFrame: def clean_data(self, data) -> pd.DataFrame: def add_technical_indicator(self, data, tech_indicator_list): def add_turbulence(self, data): def calculate_turbulence(self, data, time_period=252): def add_vix(self, data): def df_to_array(self, df, tech_indicator_list, if_vix): def get_trading_days(self, start, end): This method is required by https://github.com/AI4Finance-Foundation/FinRL/blob/master/tutorials/3-Practical/FinRL_PaperTrading_Demo.ipynb but it implements its own rather than use this. Also it implements NYSE with no option to select other exchanges like the LSE. MISSING fetch_latest_data () This method is required by https://github.com/AI4Finance-Foundation/FinRL/blob/master/tutorials/3-Practical/FinRL_PaperTrading_Demo.ipynb in get_state():

Describe the solution you'd like 1) Review the methods exposed by class DataProcessor against each implementation for consistency e.g. processor_alpaca.py, processor_yahoofinance.py. 2) download_data() needs to be implemented differently due to the different data apis needed, but consider creating one implementation of the other methods as they perform the same function – easier to change and eliminates keeping variants in step. 3) Support for different exchanges other than NYSE in get_trading_days() for example. 4) Support for different trading times per exchange, rather than fix for NYSE in download_data() and clean_data() for example. 5) The use of VIXY is USA specific. What is the alternative for implementing other countries? 6) There are three versions of paper trading creating confusion and needing to compare them to see which version to use: https://github.com/AI4Finance-Foundation/FinRL/blob/master/tutorials/3-Practical/FinRL_PaperTrading_Demo.ipynb https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/tutorials/3-Practical/FinRL_PaperTrading_Demo.ipynb https://github.com/AI4Finance-Foundation/FinRL-Tutorials/blob/master/3-Practical/FinRL_PaperTrading_Demo.ipynb 7) Include more comments and eliminate using literal values – example in calculate_turbulence() it is hard to understand if there are little comments present e.g. what does this mean? if temp > 0: count += 1 if count > 2: turbulence_temp = temp[0][0] else:

avoid large outlier because of the calculation just begins

                turbulence_temp = 0
        else:
            turbulence_temp = 0

Hope this feedback is constructive to help improve the FinRL offering.

AI4Finance-Foundation / FinRL

Meta - feedback so far on issues implementing a new data processor/trading operations #756

avoid large outlier because of the calculation just begins