Closed cathiest closed 5 years ago
Looks like the packages from the default conda package list aren't installed by default, so we need to specify the full list of required packages more carefully. @martintoreilly do we have a list somewhere?
Current python 2.7 packages
absl-py 0.6.1
astor 0.7.1
backports-abc 0.5
backports.functools-lru-cache 1.5
backports.shutil-get-terminal-size 1.0.0
backports.weakref 1.0.post1
bleach 3.0.2
certifi 2018.10.15
cffi 1.11.5
configparser 3.5.0
cycler 0.10.0
Cython 0.29
decorator 4.3.0
entrypoints 0.2.3
enum34 1.1.6
funcsigs 1.0.2
functools32 3.2.3.post2
futures 3.2.0
gast 0.2.0
grpcio 1.16.1
h5py 2.8.0
ipaddress 1.0.22
ipykernel 4.10.0
ipython 5.8.0
ipython-genutils 0.2.0
Jinja2 2.10
jsonschema 2.6.0
jupyter-client 5.2.3
jupyter-core 4.4.0
jupyterlab 0.33.11
jupyterlab-launcher 0.11.2
Keras 2.2.4
Keras-Applications 1.0.6
Keras-Preprocessing 1.0.5
kiwisolver 1.0.1
linecache2 1.0.0
Markdown 3.0.1
MarkupSafe 1.1.0
matplotlib 2.2.2
mistune 0.8.4
mkl-fft 1.0.6
mkl-random 1.0.1
mock 2.0.0
nbconvert 5.3.1
nbformat 4.4.0
notebook 5.7.2
numpy 1.15.4
olefile 0.46
pandocfilters 1.4.2
pathlib2 2.3.2
pbr 5.1.1
pexpect 4.6.0
pickleshare 0.7.5
Pillow 5.3.0
pip 18.1
prometheus-client 0.4.2
prompt-toolkit 1.0.15
protobuf 3.6.1
ptyprocess 0.6.0
pycparser 2.19
Pygments 2.2.0
pyparsing 2.3.0
pystan 2.18.0.0
python-dateutil 2.7.5
pytz 2018.7
PyYAML 3.13
pyzmq 17.1.2
scandir 1.9.0
scipy 1.1.0
Send2Trash 1.5.0
setuptools 40.6.2
simplegeneric 0.8.1
singledispatch 3.4.0.3
sip 4.19.13
six 1.11.0
subprocess32 3.5.3
tensorboard 1.12.0
tensorflow 1.12.0
termcolor 1.1.0
terminado 0.8.1
testpath 0.4.2
torch 0.4.1.post2
torchvision 0.2.1
tornado 5.1.1
traceback2 1.4.0
traitlets 4.3.2
unittest2 1.1.0
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.14.1
wheel 0.32.3
Current python 3.5 packages
absl-py 0.4.1
alembic 1.0.0
appdirs 1.4.3
asn1crypto 0.24.0
astor 0.7.1
async-generator 1.10
attrs 18.2.0
Automat 0.7.0
backcall 0.1.0
bleach 2.1.4
certifi 2018.8.24
cffi 1.11.5
chardet 3.0.4
constantly 15.1.0
cryptography 2.3.1
cycler 0.10.0
Cython 0.28.5
decorator 4.3.0
entrypoints 0.2.3
gast 0.2.0
grpcio 1.12.1
h5py 2.8.0
html5lib 1.0.1
hyperlink 18.0.0
idna 2.7
incremental 17.5.0
ipykernel 4.10.0
ipython 6.5.0
ipython-genutils 0.2.0
jedi 0.12.1
Jinja2 2.10
jsonschema 2.6.0
jupyter-client 5.2.3
jupyter-core 4.4.0
jupyterhub 0.9.4
jupyterhub-ldapauthenticator 1.2.2
jupyterlab 0.34.9
jupyterlab-launcher 0.13.1
Keras 2.2.2
Keras-Applications 1.0.4
Keras-Preprocessing 1.0.2
kiwisolver 1.0.1
ldap3 2.5.1
Mako 1.0.7
Markdown 2.6.11
MarkupSafe 1.0
matplotlib 3.0.0
mistune 0.8.3
mkl-fft 1.0.6
mkl-random 1.0.1
nbconvert 5.3.1
nbformat 4.4.0
notebook 5.6.0
numpy 1.15.2
olefile 0.46
pamela 0.3.0
pandocfilters 1.4.2
parso 0.3.1
pexpect 4.6.0
pickleshare 0.7.4
Pillow 5.2.0
pip 10.0.1
prometheus-client 0.3.1
prompt-toolkit 1.0.15
protobuf 3.6.0
ptyprocess 0.6.0
pyasn1 0.4.4
pyasn1-modules 0.2.2
pycparser 2.19
pycurl 7.43.0.2
Pygments 2.2.0
pyOpenSSL 18.0.0
pyparsing 2.2.1
PySocks 1.6.8
pystan 2.18.0.0
python-dateutil 2.7.3
python-editor 1.0.3
python-oauth2 1.0.1
pytz 2018.5
PyYAML 3.13
pyzmq 17.1.2
requests 2.19.1
scipy 1.1.0
Send2Trash 1.5.0
service-identity 17.0.0
setuptools 40.2.0
simplegeneric 0.8.1
sip 4.19.12
six 1.11.0
SQLAlchemy 1.2.11
tensorboard 1.10.0
tensorflow 1.10.0
termcolor 1.1.0
terminado 0.8.1
testpath 0.3.1
torch 0.4.1.post2
torchvision 0.2.1
tornado 5.1.1
traitlets 4.3.2
Twisted 18.7.0
urllib3 1.23
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.14.1
wheel 0.31.1
zope.interface 4.5.0
Cython bisect linecache secrets
IPython bleach locale select
PyQt5 builtins logging selectors
future bz2 lzma send2trash
_ast cProfile macpath setuptools
_asyncio calendar macurl2path shelve
_bisect certifi mailbox shlex
_blake2 cgi mailcap shutil
_bootlocale cgitb markupsafe signal
_bz2 chardet marshal simplegeneric
_codecs chunk math sip
_codecs_cn cmath matplotlib sipconfig
_codecs_hk cmd mimetypes sipdistutils
_codecs_iso2022 code mistune site
_codecs_jp codecs mmap sitecustomize
_codecs_kr codeop modulefinder six
_codecs_tw collections multiprocessing sklearn
_collections colorsys nbconvert smtpd
_collections_abc compileall nbformat smtpd2
_compat_pickle concurrent netrc smtplib
_compression configparser networkx sndhdr
_crypt contextlib nis snowballstemmer
_csv copy nntplib socket
_ctypes copyreg nose socketserver
_ctypes_test crypt notebook sphinx
_curses csv ntpath sqlite3
_curses_panel ctypes nturl2path sre_compile
_datetime curses numbers sre_constants
_dbm cycler numpy sre_parse
_decimal cython numpydoc ssl
_dummy_thread cythonmagic opcode stat
_elementtree datetime operator statistics
_functools dateutil optparse storemagic
_gdbm dbm os string
_hashlib decimal packaging stringprep
_heapq decorator pandas struct
_imp difflib pandocfilters subprocess
_io dis parser sunau
_json distutils parso symbol
_locale doctest past sympyprinting
_lsprof docutils pathlib symtable
_lzma dummy_threading patsy sys
_markupbase easy_install pdb sysconfig
_md5 email pexpect syslog
_multibytecodec encodings pickle tabnanny
_multiprocessing ensurepip pickleshare tarfile
_opcode entrypoints pickletools telnetlib
_operator enum pip tempfile
_osx_support errno pipes terminado
_pickle event_rpcgen pkg_resources termios
_posixsubprocess faulthandler pkgutil test
_pydecimal fcntl platform testpath
_pyio filecmp plistlib tests
_random fileinput poplib textwrap
_scproxy fnmatch posix theano
_sha1 formatter posixpath this
_sha256 fractions pprint threading
_sha3 ftplib profile tick
_sha512 functools prompt_toolkit time
_signal future pstats timeit
_sitebuiltins gc pty tkinter
_socket genericpath ptyprocess token
_sqlite3 getopt pwd tokenize
_sre getpass py_compile tornado
_ssl gettext pybasicbayes tqdm
_stat glob pyclbr trace
_string grp pydoc traceback
_strptime gzip pydoc_data tracemalloc
_struct h5py pyexpat traitlets
_symtable hashlib pygments tty
_sysconfigdata_m_darwin_darwin heapq pyhawkes turtle
_testbuffer hmac pylab turtledemo
_testcapi html pymc types
_testimportmultiple html5lib pymc3 typing
_testmultiphase http pyparsing unicodedata
_thread idlelib pytz unittest
_threading_local idna pyximport urllib
_tkinter imagesize qtconsole urllib3
_tracemalloc imaplib queue uu
_warnings imghdr quopri uuid
_weakref imp random venv
_weakrefset importlib re warnings
abc inspect readline wave
aifc io reprlib wcwidth
alabaster ipaddress requests weakref
antigravity ipykernel resource webbrowser
appnope ipykernel_launcher rlcompleter webencodings
argparse ipyparallel rmagic wheel
array ipython_genutils rst2html widgetsnbextension
ast ipywidgets rst2latex wsgiref
asynchat itertools rst2man xdrlib
asyncio jedi rst2odt xml
asyncore jinja2 rst2odt_prepstyles xmlrpc
atexit joblib rst2pseudoxml xxlimited
audioop json rst2s5 xxsubtype
autograd jsonschema rst2xetex zipapp
autoreload jupyter rst2xml zipfile
babel jupyter_client rstpep2html zipimport
base64 jupyter_core runant zlib
bdb keyword runpy zmq
bin lib2to3 sched
binascii libfuturize scipy
binhex libpasteurize seaborn
Current python 3.6 packages
absl-py 0.6.1
alembic 1.0.5
asn1crypto 0.24.0
astor 0.7.1
async-generator 1.10
backcall 0.1.0
bleach 3.0.2
certifi 2018.10.15
cffi 1.11.5
chardet 3.0.4
cryptography 2.3.1
cycler 0.10.0
Cython 0.29
decorator 4.3.0
entrypoints 0.2.3
gast 0.2.0
grpcio 1.14.1
h5py 2.8.0
idna 2.7
ipykernel 5.1.0
ipython 7.2.0
ipython-genutils 0.2.0
jedi 0.13.1
Jinja2 2.10
jsonschema 2.6.0
jupyter-client 5.2.3
jupyter-core 4.4.0
jupyterhub 0.9.4
jupyterhub-ldapauthenticator 1.2.2
jupyterlab 0.35.3
jupyterlab-server 0.2.0
Keras 2.2.4
Keras-Applications 1.0.6
Keras-Preprocessing 1.0.5
kiwisolver 1.0.1
ldap3 2.5.1
Mako 1.0.7
Markdown 3.0.1
MarkupSafe 1.1.0
matplotlib 3.0.1
mistune 0.8.4
mkl-fft 1.0.6
mkl-random 1.0.1
nbconvert 5.3.1
nbformat 4.4.0
notebook 5.7.2
numpy 1.15.4
olefile 0.46
pamela 0.3.0
pandocfilters 1.4.2
parso 0.3.1
pexpect 4.6.0
pickleshare 0.7.5
Pillow 5.3.0
pip 18.1
prometheus-client 0.4.2
prompt-toolkit 2.0.7
protobuf 3.6.1
ptyprocess 0.6.0
pyasn1 0.4.4
pycparser 2.19
pycurl 7.43.0.2
Pygments 2.2.0
pyOpenSSL 18.0.0
pyparsing 2.3.0
PySocks 1.6.8
pystan 2.18.0.0
python-dateutil 2.7.5
python-editor 1.0.3
python-oauth2 1.1.0
pytz 2018.7
PyYAML 3.13
pyzmq 17.1.2
requests 2.20.1
scipy 1.1.0
Send2Trash 1.5.0
setuptools 40.6.2
sip 4.19.13
six 1.11.0
SQLAlchemy 1.2.14
tensorboard 1.12.0
tensorflow 1.12.0
termcolor 1.1.0
terminado 0.8.1
testpath 0.4.2
torch 0.4.1.post2
torchvision 0.2.1
tornado 5.1.1
traitlets 4.3.2
urllib3 1.23
wcwidth 0.1.7
webencodings 0.5.1
Werkzeug 0.14.1
wheel 0.32.3
Other specific requests: monocle
(python) and seurat
(R)
@cathiest - can you check this one? It seems more likely that we want this R package (https://bioconductor.org/packages/release/bioc/html/monocle.html) than this python package (https://pypi.org/project/monocle/)
(python) The current packages don't look too bad. Following the Environment Design document, a very substantial omission (imo 'IMPORTANT') appears to be pandas
, less important ('DESIRED') would be seaborn
and statsmodels
, and there may be others -- I haven't done a full cross-check with the default conda package list, but I know many who use these two (incl. myself). I wonder whether pytorch works without the pytorch module? torch/torchvision are there, which I think are the important ones, but may be worth testing this.
pip freeze
)We will include all sources above and give priority to sources further up the list (i.e. lower numbered).
Must have packages from my various environments are:
beautifulsoup4
h5py
ipython
nbconvert
nltk
numpy
pandas
pip
pystan
{pytorch, torchvision, torch}
scikit-image
scikit-learn
scipy
seaborn
spacy
statsmodels
tensorflow
and desirable packages:
curl
cython
dask
dask-core
intelpython
ipywidgets
mkl
numba
plotly
pyyaml
sqlalchemy
sqlite
Hmm, that's weird @ornithos - pytorch
is in the list of stuff we're explicitly asking conda
to install. The lists above came from pip
, so maybe this is just a conda
vs. pip
discrepancy? Anyway, if you look at https://github.com/alan-turing-institute/data-safe-haven/pull/147, you can see the current status of the (much larger) list of packages we're installing for future builds. Let us know if you see anything missing there.
FYI @martintoreilly, all of the packages listed by @ornithos are already on the lists except intelpython
, plotly
and spacy
.
For the NATS challenge the facebook prophet package fbprophet
would be a nice one to have.
pip install fbprophet
plotly
, spacy
are in conda for all pythons 2.7, 3.5 and 3.6 and have been added to the build.
intelpython
and fbprophet
are not available in conda for any of our supported python versions. For now we are not supporting packages not in conda as part of the standard build, but we may be able to support then as part of a custom deploy for a particular challenge. I will add the fbprophet
to the NATS deplot.
I think that the base
environment (which uses Python 3.7) may have the packages ticked as "In installer" on the Anaconda website. I have check in the test environment and pandas
appears to be installed at least.
conda activate base
Hi, report from Team NATS that they need the following packages for the VM. Not sure if they are in your build already. Post a thumbs up if they are?
Following Zoom discussion just now, can we clarify the minimal list of python packages that we absolutely have to install on top of what we already have? @fkiraly @cathiest @jamespjh @martintoreilly ? The smaller the better from the perspective of testing that everything works as expected.
I see some packages are also being logged in issue alan-turing-institute/DSG-Dec18-issuelog/issues/6
EDIT: Realised #147 is the pull request and consolidation should happen in this issue.
I think we should try to include the Anaconda "In installer" packages, but could easily be persuaded we should start with a minimal list first to at least get something deployed. The risk of this approach is we will then be deploying a third environment for users to migrate to.
I'd suggest a minimal explicit list from all DSG groups first, then try adding the Anaconda "in installer" packages while @fkiraly etc are testing the minimal set? This should give us the option to re-test the explicit list in the larger list VM and deploy that if there are no regressions.
Updated the title of this issue to reflect wider scope
I vote for starting with a minimal list and then expanding the scope afterwards. From https://github.com/alan-turing-institute/DSG-Dec18-issuelog/issues/6 I have come up with: basemap
bokeh
fbprophet
geopandas
gpflow
keras
matplotlib
numpy
pandas
pandas_profiling
scikit-learn
seaborn
tensorflow
tsfresh
, some of which are already available in the currently deployed VMs.
From Alex Bird in https://github.com/alan-turing-institute/DSG-Dec18-issuelog/issues/6
I've collated the above lists and given my best assessment of categories:
CRITICAL
numpy pandas sklearn
- matplotlib IMPORTANT & Easy
geopandas pandas_profiling tsfresh basemap seaborn IMPORTANT & can have compatibility issues / sometimes challenging to set up
gpflow fbprophet https://github.com/facebook/prophet pystan #(fbprophet depends anyway) tensorflow keras NICE TO HAVE
bokeh pytorch #(surprised nobody's asked for this, but I guess tf is there)
Let's follow the process in #101 / #87 as to whether this should be installed after everyone leaves at 6pm for the social!