MeteoSwiss / mlpp-lib

Collection of methods for ML-based postprocessing of weather forecasts.
BSD 3-Clause "New" or "Revised" License
9 stars 1 forks source link

Val_loss NaN for any training #58

Closed louisPoulain closed 1 week ago

louisPoulain commented 2 weeks ago
sum_vobs_nan.ref().deref()=<tf.Tensor: shape=(), dtype=int32, numpy=0>/500000
tf.reduce_sum(tf.cast(tf.math.is_nan(y_pred.distribution.normal.mean()), tf.int32))=<tf.Tensor: shape=(), dtype=int32, numpy=544>/500000
tf.reduce_sum(tf.cast(tf.math.is_nan(y_pred.distribution.normal.stddev()), tf.int32))=<tf.Tensor: shape=(), dtype=int32, numpy=544>/500000
tf.reduce_sum(tf.cast(tf.math.is_nan(y_pred.mean()), tf.int32))=<tf.Tensor: shape=(), dtype=int32, numpy=544>/500000
sum_samp1_nan.ref().deref()=<tf.Tensor: shape=(), dtype=int32, numpy=54400>/50000000
sum_samp2_nan.ref().deref()=<tf.Tensor: shape=(), dtype=int32, numpy=54400>/50000000
sum_e1_nan.ref().deref()=<tf.Tensor: shape=(), dtype=int32, numpy=544>/500000
sum_e2_nan.ref().deref()=<tf.Tensor: shape=(), dtype=int32, numpy=544>/500000
sum_twcrps_nan.ref().deref()=<tf.Tensor: shape=(), dtype=int32, numpy=544>/500000

For any training there is exactly one batch of data (size 500'000) that produces exactly 544 NaNs in the predicted distribution. The distribution is a doubly-censored normal. The actual loss, on the other hand is never NaN

Other infos

normalizer: fillvalue: -5 default: Standardizer

targets:

sample_weights:

data_partitioning: time_split: train:

model: fully_connected_network: hidden_layers: [512, 512, 512] activations: relu dropout: [0.3, 0.2, 0.1] probabilistic_layer: IndependentDoublyCensoredNormal mc_dropout: True

optimizer: Adam: learning_rate: CosineDecayRestarts: initial_learning_rate: 0.001 first_decay_steps: 20 t_mul: 2.0 m_mul: 1.025 alpha: 0.0 beta_1: 0.94 loss: WeightedCRPSEnergy: threshold: 0.0 n_samples: 100 epochs: 4 steps_per_epoch: 1

metrics:

callbacks:

Env

Environment absl-py==2.1.0 aiohappyeyeballs @ file:///home/conda/feedstock_root/build_artifacts/aiohappyeyeballs_1724167852130/work aiohttp @ file:///home/conda/feedstock_root/build_artifacts/aiohttp_1727281375658/work aiosignal @ file:///home/conda/feedstock_root/build_artifacts/aiosignal_1667935791922/work alembic @ file:///home/conda/feedstock_root/build_artifacts/alembic_1727122811080/work aniso8601 @ file:///home/conda/feedstock_root/build_artifacts/aniso8601_1618789466884/work asciitree==0.3.3 asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work astunparse==1.6.3 async-timeout @ file:///home/conda/feedstock_root/build_artifacts/async-timeout_1691763562544/work attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1722977137225/work bcrypt @ file:///home/conda/feedstock_root/build_artifacts/bcrypt_1724960420580/work blinker @ file:///home/conda/feedstock_root/build_artifacts/blinker_1715091184126/work bokeh @ file:///home/conda/feedstock_root/build_artifacts/bokeh_1719324651922/work boto3 @ file:///home/conda/feedstock_root/build_artifacts/boto3_1727422381215/work botocore @ file:///home/conda/feedstock_root/build_artifacts/botocore_1727397771150/work Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1725267488082/work cachetools @ file:///home/conda/feedstock_root/build_artifacts/cachetools_1724028158384/work certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1725278078093/work/certifi cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1725571112467/work cftime @ file:///home/conda/feedstock_root/build_artifacts/cftime_1725400455427/work charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1698833585322/work click @ file:///home/conda/feedstock_root/build_artifacts/click_1692311806742/work cloudpickle @ file:///home/conda/feedstock_root/build_artifacts/cloudpickle_1697464713350/work contourpy @ file:///home/conda/feedstock_root/build_artifacts/contourpy_1727293517607/work cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography-split_1725443044072/work cycler @ file:///home/conda/feedstock_root/build_artifacts/cycler_1696677705766/work cytoolz @ file:///home/conda/feedstock_root/build_artifacts/cytoolz_1706897086113/work dask==2022.12.1 dask-expr @ file:///home/conda/feedstock_root/build_artifacts/dask-expr_1722982607046/work databricks-sdk @ file:///home/conda/feedstock_root/build_artifacts/databricks-sdk_1726835227694/work decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work Deprecated @ file:///home/conda/feedstock_root/build_artifacts/deprecated_1685233314779/work distributed @ file:///home/conda/feedstock_root/build_artifacts/distributed_1722982528621/work dm-tree==0.1.8 docker @ file:///home/conda/feedstock_root/build_artifacts/docker-py_1716508870406/work entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work fasteners @ file:///home/conda/feedstock_root/build_artifacts/fasteners_1643971550063/work Flask @ file:///home/conda/feedstock_root/build_artifacts/flask_1712667726126/work flatbuffers==24.3.25 fonttools @ file:///home/conda/feedstock_root/build_artifacts/fonttools_1727206408738/work frozenlist @ file:///home/conda/feedstock_root/build_artifacts/frozenlist_1725395644230/work fsspec @ file:///home/conda/feedstock_root/build_artifacts/fsspec_1725543257300/work gast==0.4.0 gitdb @ file:///home/conda/feedstock_root/build_artifacts/gitdb_1697791558612/work GitPython @ file:///home/conda/feedstock_root/build_artifacts/gitpython_1711991025291/work google-auth @ file:///home/conda/feedstock_root/build_artifacts/google-auth_1726832896641/work google-auth-oauthlib==1.0.0 google-pasta==0.2.0 graphene @ file:///home/conda/feedstock_root/build_artifacts/graphene_1690379572063/work graphql-core @ file:///home/conda/feedstock_root/build_artifacts/graphql-core_1725549136655/work graphql-relay @ file:///home/conda/feedstock_root/build_artifacts/graphql-relay_1650134628625/work greenlet @ file:///home/conda/feedstock_root/build_artifacts/greenlet_1726922189413/work grpcio==1.66.1 gunicorn @ file:///home/conda/feedstock_root/build_artifacts/gunicorn_1713358040599/work h5py==3.12.1 idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1726459485162/work importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1726082825846/work importlib_resources @ file:///home/conda/feedstock_root/build_artifacts/importlib_resources_1725921340658/work ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1701831663892/work itsdangerous @ file:///home/conda/feedstock_root/build_artifacts/itsdangerous_1713372668944/work jax==0.4.30 jaxlib==0.4.30 jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1715127149914/work jmespath @ file:///home/conda/feedstock_root/build_artifacts/jmespath_1655568249366/work joblib @ file:///home/conda/feedstock_root/build_artifacts/joblib_1714665484399/work keras==2.12.0 kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/kiwisolver_1725459266648/work libclang==18.1.1 llvmlite==0.43.0 locket @ file:///home/conda/feedstock_root/build_artifacts/locket_1650660393415/work lz4 @ file:///home/conda/feedstock_root/build_artifacts/lz4_1725089417274/work Mako @ file:///home/conda/feedstock_root/build_artifacts/mako_1715711344987/work Markdown @ file:///home/conda/feedstock_root/build_artifacts/markdown_1710435156458/work MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1724959465445/work matplotlib==3.9.2 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work ml_dtypes==0.5.0 mlflow @ file:///home/conda/feedstock_root/build_artifacts/mlflow-split_1726566280547/work mlflow-skinny @ file:///home/conda/feedstock_root/build_artifacts/mlflow-split_1726566280547/work mlpp-lib==0.12.2 msgpack @ file:///home/conda/feedstock_root/build_artifacts/msgpack-python_1725975012026/work multidict @ file:///home/conda/feedstock_root/build_artifacts/multidict_1725953652790/work munkres==1.1.4 netCDF4 @ file:///home/conda/feedstock_root/build_artifacts/netcdf4_1725449927647/work numba @ file:///home/conda/feedstock_root/build_artifacts/numba_1718888028049/work numcodecs @ file:///home/conda/feedstock_root/build_artifacts/numcodecs_1715218778254/work numpy==1.24.3 oauthlib==3.2.2 opentelemetry-api @ file:///home/conda/feedstock_root/build_artifacts/opentelemetry-api_1676680662101/work opentelemetry-sdk @ file:///home/conda/feedstock_root/build_artifacts/opentelemetry-sdk_1676709164054/work opentelemetry-semantic-conventions @ file:///home/conda/feedstock_root/build_artifacts/opentelemetry-semantic-conventions_1676680479396/work opt_einsum==3.4.0 packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1718189413536/work pandas==1.5.3 paramiko @ file:///home/conda/feedstock_root/build_artifacts/paramiko_1726748051454/work parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work partd @ file:///home/conda/feedstock_root/build_artifacts/partd_1715026491486/work pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1726075067949/work prometheus_client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1726901976720/work prometheus_flask_exporter @ file:///home/conda/feedstock_root/build_artifacts/prometheus_flask_exporter_1720670279306/work prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1727341649933/work properscoring==0.1 protobuf==4.25.3 psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1725737916340/work ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work pyarrow==17.0.0 pyarrow-hotfix @ file:///home/conda/feedstock_root/build_artifacts/pyarrow-hotfix_1700596371886/work pyasn1 @ file:///home/conda/feedstock_root/build_artifacts/pyasn1_1726839225972/work pyasn1_modules @ file:///home/conda/feedstock_root/build_artifacts/pyasn1-modules_1726029546107/work pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1711811537435/work Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work PyNaCl @ file:///home/conda/feedstock_root/build_artifacts/pynacl_1725739244417/work pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1722587090966/work pyparsing @ file:///home/conda/feedstock_root/build_artifacts/pyparsing_1724616129934/work PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1709299778482/work pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1706886791323/work pyu2f @ file:///home/conda/feedstock_root/build_artifacts/pyu2f_1604248910016/work PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1725456176299/work querystring_parser @ file:///home/conda/feedstock_root/build_artifacts/querystring_parser_1723625595981/work requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1717057054362/work requests-oauthlib==2.0.0 rsa @ file:///home/conda/feedstock_root/build_artifacts/rsa_1658328885051/work s3transfer @ file:///home/conda/feedstock_root/build_artifacts/s3transfer_1719300139436/work scikit-learn @ file:///home/conda/feedstock_root/build_artifacts/scikit-learn_1726082655509/work/dist/scikit_learn-1.5.2-cp39-cp39-linux_x86_64.whl#sha256=9bdea44be238844ca955b35fde2df3049752a843e3eb223cf91e68e25efefa5c scipy @ file:///home/conda/feedstock_root/build_artifacts/scipy-split_1716470218293/work/dist/scipy-1.13.1-cp39-cp39-linux_x86_64.whl#sha256=e6696cb8683d94467891b7648e068a3970f6bc0a1b3c1aa7f9bc89458eafd2f0 six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work smmap @ file:///home/conda/feedstock_root/build_artifacts/smmap_1634310307496/work sortedcontainers @ file:///home/conda/feedstock_root/build_artifacts/sortedcontainers_1621217038088/work SQLAlchemy @ file:///home/conda/feedstock_root/build_artifacts/sqlalchemy_1726596200000/work sqlparse @ file:///home/conda/feedstock_root/build_artifacts/sqlparse_1721304206023/work stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work tblib @ file:///home/conda/feedstock_root/build_artifacts/tblib_1702066284995/work tensorboard==2.12.3 tensorboard-data-server==0.7.2 tensorflow==2.12.1 tensorflow-estimator==2.12.0 tensorflow-io-gcs-filesystem==0.37.1 tensorflow-probability==0.20.1 termcolor==2.4.0 threadpoolctl @ file:///home/conda/feedstock_root/build_artifacts/threadpoolctl_1714400101435/work toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1706112571092/work tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1724955920300/work traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work typing_extensions==4.5.0 tzdata @ file:///home/conda/feedstock_root/build_artifacts/python-tzdata_1727140567071/work unicodedata2 @ file:///home/conda/feedstock_root/build_artifacts/unicodedata2_1695847984941/work urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1718728347128/work wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work websocket-client @ file:///home/conda/feedstock_root/build_artifacts/websocket-client_1713923384721/work Werkzeug @ file:///home/conda/feedstock_root/build_artifacts/werkzeug_1724330738730/work wrapt==1.14.1 xarray==2022.12.0 xyzservices @ file:///home/conda/feedstock_root/build_artifacts/xyzservices_1725366347586/work yarl @ file:///home/conda/feedstock_root/build_artifacts/yarl_1727422848961/work zarr @ file:///home/conda/feedstock_root/build_artifacts/zarr_1716779724722/work zict @ file:///home/conda/feedstock_root/build_artifacts/zict_1681770155528/work zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1726248574750/work
louisPoulain commented 1 week ago

After reviewing the features set that we have, it seems that we have Inf or -Inf values in some of our variables. The cause could be that Dataset.drop_nans only checks for the presence of NaNs and not infinite values. I'll propose a fix