Closed bjakobson closed 1 year ago
Hi @bjakobson
First. Thanks a lot for using PyAF. the script and the environment variables are simply perfect and help reproducing the problem.
I will answer your questions in two different comments. The first for functional aspects, the second for some technical points.
Here, you are trying to predict weights, given the 10 weekly values of Nov and Dec in 2021. The horizon is 50, which means that you want to predict the values for 50 next weeks.
As you mentioned, the use case you have has a limited timeline (not enough data). PyAF used past values to predict the future values. there is no miracle, we will not be able to provide meaningful forecast of summer (July) values, when we only have Nov and Dec values.
PyAF does its best by providing the mean of the 10 available values as a constant (even more linear ;) forecast for the 50 coming weeks.
There is simply no way to get something acceptable in this case. solution : increase data, this may imply waiting for the data generating process to produce more data.
Usually, a forecast is never wrong. It has an error as we are estimating the values of an unknown/future phenomenon. The quality of the forecast is measured using the error on a part of the dataset (10 points here).
Saying that a forecast is "obviously wrong" of "visually wrong" depends on the problem in question. I will appreciate if you can elaborate on this.
the main technical limit here is that 10 points is not statistically reliable enough to compute a prediction/mean/etc.
Hi @antoinecarme,
Appreciate the response! I totally understand that the lack of data will negatively affect the prediction of 50 values -- would lowering the anticipated values be a good solution (other than feeding more data since this is not possible yet, dataset will grow overtime), or would a different algorithm make more sense? Id like to point out that our data will generally be linear-ish, as shown in the data. This makes me believe linear regression is doable, but I am curious to know what you think will give decently accurate results.
Hi @bjakobson
Once you have a "decent dataset", what you said will be OK.
PyAF does not make any assumption, it tests different models , including linear regression, and outputs the best model, the one with the lowest error (MAPE).
@antoinecarme, and what qualifies as "enough data"? Would, say, 31 days give a decent enough projection? This doesn't need to be precise, but the general outcome should at least look good -- if there was a slow increase (say 100 for 5 days, 105 for the next 5..), the outcome should look like a gradual increase, not linear or descending. Thanks again!
@bjakobson
You can always increase your dataset artificially (different increasing sizes) and give your feedback.
A classical rule of thumb is to use at least 30 points to compute a statistics indicator (mean). 100 points should be enough.
@antoinecarme, could you please elaborate on what you mean by "different increasing sizes", and "artificially increase your dataset"? I will definitely keep 30 points in mind - the problem is, I am looking to predict the weight a user could benchpress, and the data I am getting is by them actually logging each workout. I am fine with waiting 1 month to display the results, but I cannot justify this model if a user has to wait 100 bench press sessions to see their results, if that makes sense. I am more than open to any feedback on this!
different increasing sizes = 10, 20, 30, ..., 100
is there a place where I can see how that is implemented? Is it just manually changing the data, or is there code needed?
I apologize for my constant questions, as you can likely tell, this is not my strong suit :)
Forecasting problems are not always easy nor feasible. There ais a kind of tradeoff between the available dataset size and the horizon that is usable. It is a functional aspect of your problem. Cannot help with that. Sorry.
You have to change your dataset manually in your python code.
right that makes sense. I am just confused how the line here is linear. i get that there is not a lot of data, but I would assume there would be enough to at least have a trend formed - ~2 months of data
and shortening the projections from 50 to 20 doesn't help
Your dataset is still too short. I will not comment again on that.
Try copy-pasting the same data (weight_dataframe) 20 times and update the time column.
so that generally worked. is that a real solution - paste the same values 20 times?
this is without updating the time column too
No artificial data is bad. It is just one way for you to see that pyaf will generate better models if you increase the size of your dataset.
DO NOT USE FAKE DATA IN PRODUCTION.
right, so taking a users data and cloning it 3 times is not a good strategy?
Not only it is not a good strategy, but is it is not ethically correct.
Not enough data is a real problem everyone has. It is normal.
I am training a basic model that is comparing weight lifted vs. time.
As you will notice, the timeline is pretty limited, but this will likely be the case in most of my uses. The visual (shown below) is linear, which is obviously incorrect.
I am not too advanced in Python or forecasting, but visually, something looks wrong. Here is my full code, which includes data:
Here is a visual output:
Here is my system info as requested:
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('Cython_version', 'NOT_INSTALLED') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('dill_version', '0.3.6') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('keras_version', 'NOT_INSTALLED') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('lightgbm_version', 'NOT_INSTALLED') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('matplotlib_version', '3.6.2') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('numpy_version', '1.23.5') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('pandas_version', '1.5.2') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('pathos_version', 'NOT_INSTALLED') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('pip_version', '22.3') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('pyaf_version', '4.0') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('pydot_version', '1.4.2') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('python_implementation', 'CPython') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('python_version', '3.11.0') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('scipy_version', '1.9.3') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('setuptools_version', '65.5.0') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('sklearn_version', '1.1.3') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('skorch_version', 'NOT_INSTALLED') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('sqlalchemy_version', '1.4.44') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('system_platform', 'macOS-12.5-arm64-arm-64bit') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('system_processor', 'arm') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('system_uname', uname_result(system='Darwin', node='MacBook-Pro.local', release='21.6.0', version='Darwin Kernel Version 21.6.0: Sat Jun 18 17:07:22 PDT 2022; root:xnu-8020.140.41~1/RELEASE_ARM64_T6000', machine='arm64')) PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('torch_version', 'NOT_INSTALLED') PYAF_SYSTEM_DEPENDENT_VERSION_INFO ('xgboost_version', '1.7.1') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('COLORTERM', 'truecolor') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('COMMAND_MODE', 'unix2003') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('GIT_ASKPASS', '/private/var/folders/_v/tdvwxstj3ljd7x9hdh16s8kc0000gn/T/AppTranslocation/98905D2F-13A3-4069-B8FB-27DEDF170F99/d/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass.sh') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('HOME', '/Users/brandonjakobson') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('KMP_DUPLICATE_LIB_OK', 'True') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('KMP_INIT_AT_FORK', 'FALSE') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('LANG', 'en_US.UTF-8') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('LOGNAME', 'brandonjakobson') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('MallocNanoZone', '0') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('OLDPWD', '/Users/brandonjakobson/Downloads/WorkoutProjections') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('ORIGINAL_XDG_CURRENT_DESKTOP', 'undefined') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('PATH', '/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/VMware') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('PWD', '/Users/brandonjakobson/Downloads/WorkoutProjections') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('SHELL', '/bin/zsh') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('SHLVL', '1') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('SSH_AUTH_SOCK', '/private/tmp/com.apple.launchd.vZZcYkY6Qx/Listeners') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('TERM', 'xterm-256color') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('TERM_PROGRAM', 'vscode') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('TERM_PROGRAM_VERSION', '1.73.0') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('TMPDIR', '/var/folders/_v/tdvwxstj3ljd7x9hdh16s8kc0000gn/T/') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('USER', 'brandonjakobson') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('USER_ZDOTDIR', '/Users/brandonjakobson') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('VSCODE_GIT_ASKPASS_EXTRA_ARGS', '--ms-enable-electron-run-as-node') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('VSCODE_GIT_ASKPASS_MAIN', '/private/var/folders/_v/tdvwxstj3ljd7x9hdh16s8kc0000gn/T/AppTranslocation/98905D2F-13A3-4069-B8FB-27DEDF170F99/d/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass-main.js') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('VSCODE_GIT_ASKPASS_NODE', '/private/var/folders/_v/tdvwxstj3ljd7x9hdh16s8kc0000gn/T/AppTranslocation/98905D2F-13A3-4069-B8FB-27DEDF170F99/d/Visual Studio Code.app/Contents/Frameworks/Code Helper.app/Contents/MacOS/Code Helper') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('VSCODE_GIT_IPC_HANDLE', '/var/folders/_v/tdvwxstj3ljd7x9hdh16s8kc0000gn/T/vscode-git-810feb144a.sock') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('VSCODE_INJECTION', '1') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('XPC_FLAGS', '0x0') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('XPC_SERVICE_NAME', '0') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('ZDOTDIR', '/Users/brandonjakobson') PYAF_SYSTEM_DEPENDENT_ENVIRONMENTVARIABLE ('', '/usr/local/bin/python3') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('CFBundleIdentifier', 'com.microsoft.VSCode') PYAF_SYSTEM_DEPENDENT_ENVIRONMENT_VARIABLE ('CF_USER_TEXT_ENCODING', '0x1F5:0x0:0x0')