Leci37 / TensorFlow-stocks-prediction-Machine-learning-RealTime

Predict operation stocks points (buy-sell) with past technical patterns, and powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM..Real time Twitter:
257 stars 80 forks source link

Proposal and some questions #28

Open tphlru opened 9 months ago

tphlru commented 9 months ago

Proposal and some questions

Hello. I really like your project, I have reviewed all the readme and tutorials in detail. I have a few questions and maybe some suggestions for improvement:

Checklist

Thanks, I would be happy to get feedback

Leci37 commented 6 months ago

Thanks for your questions.

_What is the difference between raw, raw alpha and other folders in dprice? RAW is if you get the data from yahoo API , if RAW_alpha if you get it from 0_API_alphavantage_get_old_history.py , min_max is deprecated and I have removed it , and I have created the alpaca folder for when you get it from API_alpaca 0_API_alpaca_historical.py Note: my opinion 0_API_alpaca_historical.py works better (but you have to register in alpaca)

In which of these folders should I put my OHLCV data, and what file names should I give them (or how can I change this in the program?)? Just look in the code for the string “d_price/”. it is recommended to change it in each file, since we don't know which OHLCV data provider you have (yahoo, alpha and alpaca APIs are offered by default).

From what I understand, you get the news data from twitter (is that correct?).... Suggestion: add basic functionality to parse data from listed RSS feeds. News is obtained from yahoo and finviz.com, (see file https://github.com/Leci37/TensorFlow-stocks-prediction-Machine-learning-RealTime/blob/master/news_sentiment/news_get_data_NUTS.py ) NO NEWS FROM TWITTER in this version. While news are obtained, the news are not valued by the TensorFlow models as of today. (providers with little and poor data). However the language processing is a proprietary engineering and it would take a team of +-8 people to get something decent. Besides there is not enough news and each news provider requires its own extraction bootstrap. If you are not a team it is better to focus on technical patterns.

how to calculate technical indicators with your script using my OHLCV data (some cryptocurrencies - missing in alphavantage and yahoo)? Download the data to d_price with your own code, and take a look at these 2 lines:

from features_W3_old import extract_features_v3
df_bars = extract_features_v3(df_raw, extra_columns=False) # IT WORKS Tech indicators Count: 292  

https://github.com/Leci37/TensorFlow-stocks-prediction-Machine-learning-RealTime/blob/master/Tutorial/RUN_buy_sell_Tutorial_3W_5min_RT.py

### From what I understand, you are using yahoo data from the last 6-7 days for the forecast. If I use another data source, how often should this data be (1 minute, ticks, 10 minutes)? How often should this data be updated with respect to the present (offset)? I have already tried 1min , 5min, 15min and one hour, they are all a nightmare. the change of the frequency has to be given to you by the OHLVC data provider (yahoo, alpaca). In my opinion slightly better with daily.

The forecast does not work at all for the evening and afternoon data, as it was missing in the data sets, do you have any idea or intention to fix this? I don't understand what you mean, if you are referring to the aftermarket and premarket, they are hard data to work with as the most relevant data the volume disappears. How do you get into the model data in which the volume disappears and makes the technical indicators go crazy ?

I recommend you to see my other project which is technical indicators, and simply refine (this stock works 89% of the time with the RSI below 11.34) the values using a decision tree. https://github.com/Leci37/Strategy-stock-Random-Forest-ML-sklearn-TraderView