AIStream-Peelout / flow-forecast

Deep learning PyTorch library for time series forecasting, classification, and anomaly detection (originally for flood forecasting).
https://flow-forecast.atlassian.net/wiki/spaces/FF/overview
GNU General Public License v3.0
2.02k stars 290 forks source link

increadible work, but not pythonic #272

Closed navidivan closed 3 years ago

navidivan commented 3 years ago

Thank you very much for this incredible library. Here are some comments:

  1. When you say wnb integration, it seems you mean wnb dependent! I have never use wnb, and am not interested in it at all at the moment. Not everyone needs to do full parameters sweeps. We usually start with quick and dirty experiments, to see even the attention mechanism works for our data.
  2. The code is highly specific for your own applications, flood or covid; if I want to change it to something else, it seems that I need to change lots of settings, parameters in json files, specify column names in weird places (training file), etc. which all seem to be redundant and not userfriendly. In your interview you said you are working on a library for the general public, and who are stuck with sklearn. But sklearn syntax and pipeline is kind of the gold standard at the moment. If you cannot go that far, having just simple pytorch training loops, specifying variables with "X" and "y" names or time series syntax is much more approachable than what you have done so far.
  3. Here is a challenge for you. Pick up some very dumb energy consumption data or financial data, and try to forecast it with your model without wnb. It's a nightmare! If I am mistaken, please provide a tutorial video/ notebook.
isaacmg commented 3 years ago
  1. It is called Weights and Biases and abbreievated Wandb. We recommend using Weights and Biases since it allows for easy hyperparameter sweeps. Most serious research requires sweeping over many different parameter combinations. However, there is a way specify false in the configuration file for parameter sweeps.

  2. Our framework is generalizable to any time series forecasting problems. All model repositories require a certain amount of overhead and pre-processing steps to use on new datasets. Our repository actually generally has less overhead because of how easy it is to swap parameters in and out of config files. sklearn is a not a standard nor goal that we strive to emulate as even using it for relatively simple experiments it becomes hard to track and manage result and often requires a lot of untracked spaghetti code. Flow Forecast is built with reproducibility and production in mind. Hence everything is controlled by a JSON file where everything about the run is logged. While this might be difficult initially particularly for newer data scientists, in the long run it greatly eases issues with reproducing researching, deploying models to production, and re-training models on new data. We are proud to follow in the path of other repositories like AllenNLP in this regard.

  3. We already have several tutorials that address different forecasting areas. Similarly as I said above you can specify Wandb as false in your configuration if you do not wish to use it. However, this will likely make it more difficult to find a set of parameters that forecasts well.