Questions and Suggestions

GabrielDeza, I reviewed your code and have a few suggestions which might make things easier. I'm not sure if you are choosing not to use the following, but I'd consider using the yfinance repo, https://github.com/ranaroussi/yfinance, to pull daily stock data. For additional features, I'd suggest adding technical indicators via the ta repo, https://github.com/bukosabino/ta. The add_all_ta_features function makes this simple. Also, out of curiosity, why are you using NRCLex instead of regular sentiment analysis? Did you find NRCLex produced better results than knowing whether people were more bullish than bearish and vice versa? If not, I'd suggest using FinBert, which it looks like you may already be, though there is now a Hugging Face transformers' implementation, https://huggingface.co/ProsusAI/finbert. Also, Hugging Face's transformers' pipeline feature makes this really easy, https://github.com/huggingface/transformers/blob/master/notebooks/03-pipelines.ipynb. I'd also suggest trying out additional time series models from gluonts, https://github.com/awslabs/gluon-ts. Also, if you wanted to do hyperparameter tuning, optuna, https://github.com/optuna/optuna, or maybe dask_optuna, https://github.com/jrbourbeau/dask-optuna, is fairly easy to set up. If you go this route, I've found that the HyperbandPruner performs much better than the default. You could use Ray's tune library instead, https://docs.ray.io/en/master/tune/index.html, but I've found setting up optuna to be much easier. Great work so far! I hope this helps!

Hi @CMobley7, thank you for the suggestions. I will answer each one below:

I originally found yfinance when I started the project in may 2020 but I believe it did not work for me at the time. Instead, I did the scraping myself in 2-StockScraper.py. Looking through the issues page for yfinance, it seems to work now so I will definitely add it to the pipeline. I found that when scraping from Yahoo Finance myself, depending on the frequency (1m,5m,1h,1d,etc) There is a limit on how far into the past I can retrieve data depending on my frequency (as I do not have authorization from Yahoo Finance) so if yfinance does not have that issue, I will definitely replace my current scraping method.
I never knew about the technical indicators repository, that's super cool! I am definitely going to incorporate it.
Regarding NRCLex and FinBert, in my original work (which I submitted to ICLR 2021 [https://openreview.net/forum?id=ptbb7olhGHd](available here)) I used FinBert from Dogu Araci's GitHub repo as you mentioned. I will checkout Hugging Faces pipeline as well. I have since implemented NRCLex to expand from purely binary sentiment. I have yet to do significant testing on which is better (bag-of-words with 8 emotions vs BERT for 2 financial emotions) but will include it in the updated paper.
I am working on implementing additional models from gluonts, especially some of the simpler models like ARIMA. I am also implementing the Temporal Fusion Transform model but looking through all the code to figure out how to run autograd to perform an adversarial perturbation is not always easy to figure out. I did not know about any of these hyper parameter tuning methods, I will check them out.

I have not been actively updating the repo and purely working locally but a big part of my next steps is simplifying the pipeline and keeping it active on this repo. Thank you for your interest and many suggestions, it helps alot!

Thank you for such a thorough and prompt response.

Unfortunately, yfiance won't allow you to bypass Yahoo Finance's frequency limits. So, it might not be worth it to update your repo unless you plan on extending your work into a journal. To the best of my knowledge, there is no free source of historical intraday stock data. If you find one, please let me know. FirstRate Data, https://firstratedata.com, is the cheapest source I've found, but I don't own a license; so, I can't recommend them as a customer yet. There are several sources of free historical intraday cryptocurrency data, but I don't know if that would be helpful to you.
In regards to technical indicators, I might also take a look at https://github.com/mrjbq7/ta-lib. In addition, to technical indicators, it also has recognition algorithms for the most common patterns seen in the stock market.
Understood, regarding NRCLex. If you don't mind, please let me know when you've updated your conference paper. I will hopefully have time later today or tomorrow to read through the original version.
In regard to using autograd to perform adversarial perturbations on gluonts models, it will take me some time to do a more thorough review of both your codebase, as well as the models in gluonts' codebase before I could provide suggestions here. I've only ever worked on adversarial attacks against image classifiers and object detectors, but hopefully, there is enough overlap there that I might be able to provide some help.

I actually found your codebase when I was looking through glounts's issues. I was planning on using gluonts to write a similar forecasting pipeline but wanted to see what code I could find in the space. I hadn't found a reliable/free way to get historical tweets; so, that I could utilize NLP. So, finding snscrape through your repo was a blessing; so, I felt led to provide any help I could. Hopefully, my suggestions thus far will prove helpful. If you know of any historical sources of news, please let me know. Have a great rest of the day. God bless.

@CMobley7 Hey, I was looking at some of your messages in the gluon-TS issues page and noticed that your case was M:1 forecasting (ie: you have 1 target and M-1 features and want to predict target in the future without knowing the M-1 features in the future). It seems that Gluon-TS does not really support this. Have you found other libraries or models that do perform M:1 forecasting that are actually strong?

GabrielDeza / Twitter-Adversarial-Finance

Questions and Suggestions #1