AI4Finance-Foundation / FinRL

FinRL: Financial Reinforcement Learning. 🔥
https://ai4finance.org
MIT License
10.12k stars 2.44k forks source link

Minute-based model training issue (using minute data in FinRL) #584

Closed YanchengHe closed 2 years ago

YanchengHe commented 2 years ago

Instead of training the model with daily-based data, I'm now trying to use the Price and Volume data every ten minutes to train the strategy. However, when building the environment, the shape of data doesn't correspond to the size of the State Space. The screenshots of the error message and the data preview are shown below.

Screen Shot 2022-04-25 at 4 16 13 PM Screen Shot 2022-04-25 at 4 18 23 PM
zhumingpassional commented 2 years ago

Which notebook did you run? Have you revised anything?

Some missing values may cause this problem. can you print it?

YanchengHe commented 2 years ago

The notebook I'm running is called Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading. The only thing I have revised is the model input (for both training and trading). As shown in the last photo, I have one row of data every ten minutes instead of every day. Then the error occurs, as shown in the first photo.

rayrui312 commented 2 years ago

I think the problem is that your actual state input doesn't correspond with the shape of the state space given in the environment (env.state_space). It is quite normal if you have changed the data but not edited the state settings in the environment correspondingly.

I think you may add a breakpoint in the step function in the environment. Check what the actual state input is and what shape of the state space is given. Making them consistent should solve the problem.

BruceYanghy commented 2 years ago

This issue is a typical data preprocessing problem. I got the original data from the issue owner @YanchengHe, they are trying to run a 10-minute data with news sentimental features.

Table format: date,tic,time,price,headline_sentiment,situation_senti image

BruceYanghy commented 2 years ago

10-minute data: date,tic,time,price,headline_sentiment,situation_senti Daily data by FinRL: OHLCV and technical indicators

@YanchengHe use 10-minute data left outer join Daily data by FinRL, this is not recommended, instead YanchengHe should either calculate the 10-minute OHLCV and technical indicators and join them together, or transform 10-minute data to daily data and join the FinRL table.

@YanchengHe told me they want to get it running first. I did a few checks:

  1. Run: df.tic.value_counts() to make sure the data is matched between tics, the following data is not matched, some tickers have a lot of missing data and some have duplicates, both missing data and duplicate data are not permitted in FinRL, need to fill the missing data to make sure the data length is the same between tickers image image image

  2. df['date'] =df['date'] + ' ' + df['TIME'], the DRL environment is incremented by date, this date should be unique, when you pick a random date out it should look like this image

  3. df.close=df.PRICE, in FinRL this 'close' is the main column to calculate shares and portfolio values, make sure change it.