ericmjl / bayesian-stats-modelling-tutorial

How to do Bayesian statistical modelling using numpy and PyMC3
MIT License
657 stars 280 forks source link

02-Instructor-Parameter_estimation_hypothesis_testing notebook fails to run on ONE of two computers #95

Closed mdtdev closed 4 years ago

mdtdev commented 4 years ago

I have set up according to the instructions given in the Github on three computers. On two of them, the tutorials work, on the third there is a fatal error importing pymc3. All three computers are up to date, fully patched Windows 10 boxes (I can try to provide more info if needed!). The problem is in the 02-Instructor-Parameter_estimation_hypothesis_testing notebook and happens when running the import block at top.

On two of the computers, running this first block gives the warning message:

WARNING (theano.configdefaults): g++ not available, if using conda:
`conda install m2w64-toolchain`
C:\......\anaconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\configdefaults.py:560:
UserWarning: DeprecationWarning: there is no c++ compiler.This is
deprecated and with Theano 0.11 a c++ compiler will be mandatory
  warnings.warn("DeprecationWarning: there is no c++ compiler."
WARNING (theano.configdefaults): g++ not detected ! Theano will be
unable to execute optimized C-implementations (for both CPU and GPU)
and will default to Python implementations. Performance will be
severely degraded. To remove this warning, set Theano flags cxx to an
empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation
for BLAS functions.

But things still work! All of the code blocks in the notebook execute, if slowly (for NUTS).

On the third computer, which was set up using the exact same steps, this is the error message that results (and the error is FATAL for using pymc3):

You can find the C code in this temporary file: C:\......\AppData\Local\Temp\theano_compilation_error__knt3a6r
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\lazylinker_c.py in <module>
     80                     version,
---> 81                     actual_version, force_compile, _need_reload))
     82 except ImportError:

ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\lazylinker_c.py in <module>
    104                         version,
--> 105                         actual_version, force_compile, _need_reload))
    106         except ImportError:

ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-1-3441c4f46c01> in <module>
      4 import seaborn as sns
      5 import matplotlib.pyplot as plt
----> 6 import pymc3 as pm
      7 from ipywidgets import interact
      8 import arviz as az

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\pymc3\__init__.py in <module>
      3 
      4 from .blocking import *
----> 5 from .distributions import *
      6 from .glm import *
      7 from . import gp

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\pymc3\distributions\__init__.py in <module>
----> 1 from . import timeseries
      2 from . import transforms
      3 
      4 from .continuous import Uniform
      5 from .continuous import Flat

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\pymc3\distributions\timeseries.py in <module>
----> 1 import theano.tensor as tt
      2 from theano import scan
      3 
      4 from pymc3.util import get_variable_name
      5 from .continuous import get_tau_sigma, Normal, Flat

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\__init__.py in <module>
    108     object2, utils)
    109 
--> 110 from theano.compile import (
    111     SymbolicInput, In,
    112     SymbolicOutput, Out,

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\compile\__init__.py in <module>
     10 from theano.compile.function_module import *
     11 
---> 12 from theano.compile.mode import *
     13 
     14 from theano.compile.io import *

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\compile\mode.py in <module>
      9 import theano
     10 from theano import gof
---> 11 import theano.gof.vm
     12 from theano import config
     13 from six import string_types

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\vm.py in <module>
    672     if not theano.config.cxx:
    673         raise theano.gof.cmodule.MissingGXX('lazylinker will not be imported if theano.config.cxx is not set.')
--> 674     from . import lazylinker_c
    675 
    676     class CVM(lazylinker_c.CLazyLinker, VM):

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\lazylinker_c.py in <module>
    138             args = cmodule.GCC_compiler.compile_args()
    139             cmodule.GCC_compiler.compile_str(dirname, code, location=loc,
--> 140                                              preargs=args)
    141             # Save version into the __init__.py file.
    142             init_py = os.path.join(loc, '__init__.py')

D:\......\miniconda3\envs\bayesian-modelling-tutorial\lib\site-packages\theano\gof\cmodule.py in compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, py_module, hide_symbols)
   2409             # difficult to read.
   2410             raise Exception('Compilation failed (return status=%s): %s' %
-> 2411                             (status, compile_stderr.replace('\n', '. ')))
   2412         elif config.cmodule.compilation_warning and compile_stderr:
   2413             # Print errors just below the command line.

. collect2.exe: error: ld returned 1 exit statusindows-10-10.0.18362-SP0-Intel64_Family_6_Model_94_Stepping_3_GenuineIntel-3.7.6-64/lazylinker_ext/mod.cpp:976: undefined reference to `__imp__Py_TrueStruct'Error'efined references to `__imp__Py_NoneStruct' followow

At this point, code not using pymc3 works but all pymc3 blocks crash.

Additionally, all three of the computers have trouble starting the environment for this tutorial (at Scipy) and print the following at the shell:

(base) D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>conda activate bayesian-modelling-tutorial
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET DISTUTILS_USE_SDK=1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET MSSdk=1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET "VS_VERSION=15.0"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET "VS_MAJOR=15"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>SET "VS_YEAR=2017"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "MSYS2_ARG_CONV_EXCL=/AI;/AL;/OUT;/out"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "MSYS2_ENV_CONV_EXCL=CL"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "PY_VCRUNTIME_REDIST=\bin\vcruntime140.dll"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "CXX=cl.exe"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "CC=cl.exe"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>set "VSINSTALLDIR="
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>for /F "usebackq tokens=*" %i in (`vswhere.exe -nologo -products * -version [15.0,16.0) -property installationPath`) do (set "VSINSTALLDIR=%i\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "" (for /F "usebackq tokens=*" %i in (`vswhere.exe -nologo -products * -requires Microsoft.VisualStudio.Component.VC.v141.x86.x64 -property installationPath`) do (set "VSINSTALLDIR=%i\" ) )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\" )
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>IF NOT "" == "" (
set "INCLUDE=;"
 set "LIB=;"
 set "CMAKE_PREFIX_PATH=;"
)

D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>call :GetWin10SdkDir
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>call :GetWin10SdkDirHelper HKLM\SOFTWARE\Wow6432Node  1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE\Wow6432Node  1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKLM\SOFTWARE  1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE  1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 exit /B 1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>exit /B 0
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>for /F %i in ('dir /ON /B "\include\10.*"') DO (SET WindowsSDKVer=%~i )
The system cannot find the file specified.

D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 (echo "Didn't find any windows 10 SDK. I'm not sure if things will work, but let's try..." )  else (echo Windows SDK version found as: "" )
Windows SDK version found as: ""

D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>IF "win-64" == "win-64" (
set "CMAKE_GEN=Visual Studio 15 2017 Win64"
 set "BITS=64"
)  else (
set "CMAKE_GEN=Visual Studio 15 2017"
 set "BITS=32"
)

D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>pushd C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\
The system cannot find the path specified.

D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>CALL "VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.16
The system cannot find the path specified.

D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>popd
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>IF "" == "" SET "CMAKE_GENERATOR=Visual Studio 15 2017 Win64"
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>call :GetWin10SdkDirHelper HKLM\SOFTWARE\Wow6432Node  1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE\Wow6432Node  1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKLM\SOFTWARE  1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE  1>nul 2>&1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>if errorlevel 1 exit /B 1
D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>exit /B 0
(bayesian-modelling-tutorial) D:\......\LOCAL_GITHUB_CLONES\bayesian-stats-modelling-tutorial>

But for emphasis—this shell dump occurs on all 3 computers and two of those computers work, showing only the warning at the start! Based on this output, it seems there might be a mandatory dependency for MS Visual Studio being installed and in some particular location (on one of these machines, VS is around but not found!). On the working machines pymc3 appears to use the python code fallback. For some reason it will not do this the third failing computer.

I hope that these hidden dependencies can be fixed or spelled out, as we were unable to continue with the tutorial as this killed us completely. We can try to watch the replay later if we can get things working.

I am happy to provide more information if I can!

ericmjl commented 4 years ago

Hi @mdtdev, thanks for pinging in! I appreciate that you took the time to paste in the error trace. This helps a lot.

I have a suspicion that Theano, which PyMC3 is built on top of, needs some sort of C++ compiler behind-the-scenes. From prior exposure elsewhere, I keep hearing about a Visual Studio C++ compiler thingy. On macOS and Linux, there's clang/gcc respectively. But I'm not too well-versed in the Windows world. Do you know if there's a way to check if there is a C++ compiler available on those systems?

mdtdev commented 4 years ago

Thanks for the reply!

That is the weird part of this; the conda activate step tosses out a lot of text that looks like it is searching for VS/Visual Studio, and MSYS, both of which are build environments in Windows 10 (I am not an expert on Windows, I know more about building on Linux or lower level architectures). I know that on the systems that work these tools are not installed. On the system that fails, I believe that MSYS is installed in a different environment. I checked and MSYS is not visible from the virtual environment I created with conda following your instructions.

Actually I just went to the system that I have access to and on which the notebooks run. (The one that gives the warning message above.) I have confirmed that that system does NOT have any build toolchains on it. No Visual Studio, no m2w64-toolchain from conda, and no MSYS. For some reason that system falls back on the python implementations in that version of theano.

As an alternate, I followed the instructions for installing (via conda) the m2w64-toolchain as suggested in the theano docs on the system that is failing. So I modified the virtual environment set up following your instructions by doing conda install m2w64-toolchain and restarting things. But I still got the error. I can confirm that in the modified environment I have access to a full gcc build environment, but theano doesn't find it.

I agree that theano wants a build toolchain, but I am stumped as to how to point it at the correct tools. But more importantly, why does theano fall back to python implementations on 2 systems without build tools, but then fail to do this on the 3rd system? That is the weird part!

theano says on two systems, "you should have a compiler but I will run without one anyway" but then on the third system it says "no compiler, no work". Despite all three being set up following the exact same steps...

ericmjl commented 4 years ago

Wow. That's a ton of work, and I want to affirm that you've done the right steps for debugging.

Seems like the tutorial is in a dire need of an upgrade to PyMC4, which eliminates the Theano dependency. However, I'm hesitant to do so until PyMC4 has a progress bar, and that depends on TFP implementing it (see this PyMC4 issue and this TFP issue -- posterior sampling via MCMC is where the progress bar comes in, and in pm4 this is delegated to TFP's library of MCMC samplers.

For usage and learning purposes, using a *nix system is the easiest way out, so macOS or one of the Linux flavours. And for learning, Binder or colab will be a better option, though with colab you will have to install all of the dependencies at the top of the notebook manually.

Stay tuned for updates to the repo, I'm working on a rewrite of the site so that they all take on the mkdocs-material feel (it should feel like Network Analysis Made Simple). Also, there'll be a focus on writing the data-generating model explicitly using scipy distributions, which will translate very closely to PyMC syntax (whether pm3 or pm4). That should help make learning how to use PyMC3/4 very easy!

I will close the issue for now, as I am afraid I don't have enough knowledge to go any deeper, and those issues sound like a problem with Theano on Windows. Theano has been handed over to the PyMC3 team (which I'm part of) to maintain, but without the level of familiarity with the codebase as the creators at MILA have, we are barely able to keep it bug-free as OSes and toolchains evolve, and are simply trying to keep it afloat for long enough until pm4 is finished development. For now, a *nix os is probably the right way to go, or to leverage Binder/Docker container to get things done. Having just gotten familiar enough with Docker to use it, I might add that as an option for running the notebooks.

Stay tuned for the updates!