bashtage / arch

ARCH models in Python
Other
1.32k stars 245 forks source link

Issue when using arch.bootstrap.MCS #654

Closed loooomii closed 1 year ago

loooomii commented 1 year ago

Hello, when I use MCS for comparison between models, some error functions give

  File "/Users/zhuoyue/Documents/PycharmProjects/Project/test_MCS.py", line 152, in main
    mcs_mape.compute()
  File "/Users/zhuoyue/Documents/PycharmProjects/Project/venv/lib/python3.10/site-packages/arch/bootstrap/multiple_comparison.py", line 204, in compute
    self._compute_max()
  File "/Users/zhuoyue/Documents/PycharmProjects/Project/venv/lib/python3.10/site-packages/arch/bootstrap/multiple_comparison.py", line 300, in _compute_max
    eliminated.append((int(indices.flat[locs.squeeze()]), pval))
TypeError: only size-1 arrays can be converted to Python scalars

in case of some values. I'm not very clear about the exact conditions under which this error appears. It always appears randomly in some cases.

bashtage commented 1 year ago

Probably something to do with the latest NumPy changes. Can you provide some more information about the line you use to call MCS, including the data types and shape?

loooomii commented 1 year ago

I used

    mcs_mse = MCS(error_mse, size=0.05, method='max')
    mcs_mae = MCS(error_mae, size=0.05, method='max')
    mcs_mape = MCS(error_mape, size=0.05, method='max')
    mcs_smape = MCS(error_smape, size=0.05, method='max')
    mcs_qlike = MCS(error_qlike, size=0.05, method='max')

    mcs_mae.compute()
    mcs_mse.compute()
    mcs_qlike.compute()
    mcs_mape.compute()
    mcs_smape.compute()

to call MCS

Here the information of my error functions is

qlike :[[0.42840797 0.39531405 0.49885183]] type :<class 'numpy.ndarray'> shape :(1, 3)
mse :[[0.09699626 0.08983257 0.52129932]] type :<class 'numpy.ndarray'> shape :(1, 3)
mae :[[0.27644839 0.26637054 0.63971759]] type :<class 'numpy.ndarray'> shape :(1, 3)
mape :[[0.80116911 0.6866475  0.54970106]] type :<class 'numpy.ndarray'> shape :(1, 3)
smape :[[65.99426035 62.52946571 82.58094636]] type :<class 'numpy.ndarray'> shape :(1, 3)
loooomii commented 1 year ago

Not all of them will go wrong every time, and the error function that goes wrong when the same program is run twice is not necessarily the same.

bashtage commented 1 year ago

Are you running on the same computer all times? Or possibly in a cloud install This is a bit strange. Could you paste the result of

import pandas as pd
pd.show_versions()

from a run the failed.

bashtage commented 1 year ago

Something may be wrong in your understanding of MCS. If you want to compare loss functions, your input should be T by m where m is the number of models and T is the sample size. For example, to use MCS with qlik losses, you compute the loss for each time period for each model.

loooomii commented 1 year ago

Are you running on the same computer all times? Or possibly in a cloud install This is a bit strange. Could you paste the result of

import pandas as pd
pd.show_versions()

from a run the failed.

Yes, I run it on the same computer all times. Here is the information of pandas.show_versions()

INSTALLED VERSIONS
------------------
commit           : 2e218d10984e9919f0296931d92ea851c6a6faf5
python           : 3.10.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 22.4.0
Version          : Darwin Kernel Version 22.4.0: Mon Mar  6 21:00:41 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T8103
machine          : arm64
processor        : arm
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.UTF-8
pandas           : 1.5.3
numpy            : 1.24.3
pytz             : 2022.7.1
dateutil         : 2.8.2
setuptools       : 57.0.0
pip              : 23.1.2
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : None
pandas_datareader: None
bs4              : 4.11.2
bottleneck       : None
brotli           : None
fastparquet      : None
fsspec           : None
gcsfs            : None
matplotlib       : 3.6.3
numba            : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : 1.10.0
snappy           : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
zstandard        : None
tzdata           : None

Something may be wrong in your understanding of MCS. If you want to compare loss functions, your input should be T by m where m is the number of models and T is the sample size. For example, to use MCS with qlik losses, you compute the loss for each time period for each model.

I use MCS to compare three time series models. I calculate the loss functions of the model every 22 days as a sample, then I get several samples in a longer interval, and then I use these samples to compare. This problem also occurs when the input shape is (24, 3) or other times.

bashtage commented 1 year ago

When it fails, will it always fail with the same data?

loooomii commented 1 year ago

I tested my models with the same data, but because two of the models were a bit random, the results of the calculated loss function were not the same each time.

I ran the case twice more where the shape of the input loss was (1,3). Even though the values of the input loss are not exactly the same both times, the error is in the same place.

Here is the data information of where the error occurred this time.

qlike :[[0.38443391 0.56317765 0.49531343]] type :<class 'numpy.ndarray'> shape :(1, 3)
mse :[[0.10918251 0.14925831 0.53674966]] type :<class 'numpy.ndarray'> shape :(1, 3)
mae :[[0.27002907 0.31174622 0.49793814]] type :<class 'numpy.ndarray'> shape :(1, 3)
mape :[[0.45228915 0.71662962 0.4635835 ]] type :<class 'numpy.ndarray'> shape :(1, 3)
smape :[[56.47898975 64.50894961 65.16465342]] type :<class 'numpy.ndarray'> shape :(1, 3)
qlike :[[0.38443391 0.39939706 0.2619653 ]] type :<class 'numpy.ndarray'> shape :(1, 3)
mse :[[0.10918251 0.12636466 0.35217552]] type :<class 'numpy.ndarray'> shape :(1, 3)
mae :[[0.27002907 0.27632961 0.35660937]] type :<class 'numpy.ndarray'> shape :(1, 3)
mape :[[ 0.45228915  0.43890637 16.10352968]] type :<class 'numpy.ndarray'> shape :(1, 3)
smape :[[56.47898975 56.12104852 59.04854384]] type :<class 'numpy.ndarray'> shape :(1, 3)

They succeeded with mse and mae, but there was an error with qlike.

bashtage commented 1 year ago

I found the issue. This was happening because there were ties when removing models. For what is it is worth, it is never valid to use MCS with a single loss as you did above. MCS will now warn in this case and related cases.