gallantlab / pycortex

Pycortex is a python-based toolkit for surface visualization of fMRI data
https://gallantlab.github.io/pycortex
BSD 2-Clause "Simplified" License
585 stars 137 forks source link

Not importable from within the source directory #42

Closed yarikoptic closed 10 years ago

yarikoptic commented 10 years ago
> python setup.py build_ext --inplace
...
> PYTHONPATH=$PWD python -c 'import cortex'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "cortex/__init__.py", line 1, in <module>
    from .dataset import Dataset, VolumeData, VertexData, DataView, View
  File "cortex/dataset/__init__.py", line 22, in <module>
    from ..db import surfs
  File "cortex/db.py", line 20, in <module>
    filestore = options.config.get('basic', 'filestore')
  File "/usr/lib/python2.7/ConfigParser.py", line 618, in get
    raise NoOptionError(option, section)
ConfigParser.NoOptionError: No option 'filestore' in section: 'basic'

I guess it should more resilient to some configuration being missing? ;)

jamesgao commented 10 years ago

Hmm, I'm not sure what the best way around this would be. A filestore definition along with the example subject is necessary, otherwise the tests would fail. Right now, the location of the default filestore is written into the defaults.cfg file when the package is installed. How would I have it detect that it's running inplace?

yarikoptic commented 10 years ago

On Mon, 18 Nov 2013, James Gao wrote:

Hmm, I'm not sure what the best way around this would be. A filestore definition along with the example subject is necessary, otherwise the tests would fail. Right now, the location of the default filestore is written into the defaults.cfg file when the package is installed.

current solution wouldn't quite scale for any , e.g. after doing

python setup.py install --prefix=INSTALL

generated path would be not absolute but even if resolved, it would break for any installation which does install just before packaging and placing things under a different location.

How would I have it detect that it's running inplace?

yeah... data shipped along with python packages always become a sore point. If it was smaller I would have just recommended also to place it under cortex/ (as we do with testing data for pymvpa2 -- shipping under mvpa2/data). then it just becomes available right there at the place where module is installed.

are you sure the filestore, as for 'use for testing or minimal default operation' could not be trimmed to some smaller size?

I expect that module itself should be usable with custom subject's/study data without requiring any data of yours... am I right? then trimming data shipped along would help not only with deployment but also minimize now slowly but steadily growing size of .git/ . The example data in its entirety could well leave e.g. in another repository or just remain where it is if you think it will stay like that for quite a while ;)

Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik

jamesgao commented 10 years ago

Sadly, that is base size of the surface itself. Each hemisphere is about 8 mb, and there are 4 surfaces per hemisphere. Gzip helps the size a bit, but there really aren't very many files in there. That's the minimal set of surfaces required to test all the mapping functionality in pycortex (admittedly, full testing for that functionality doesn't exist yet!)

One option is to use the sklearn model -- have it dynamically download the data from a server if that subject is requested. However, that requires special-casing subject "S1", which doesn't sound like a very good idea to me. You are right -- subject S1 is only for demo purposes and testing. All subject data can be your own. However, having an absolute filestore directory is a must, or else the database won't work. In fact, since our lab reuses subjects from experiment to experiment, we have a shared filestore on a network drive so everyone can access the same surfaces.

If you notice, I actually had to hack setup.py to have it install the filestore in a reasonable location. What's the best way to deal with a configuration file like that? How is that handled by other database packages?

yarikoptic commented 10 years ago

On Mon, 18 Nov 2013, James Gao wrote:

Sadly, that is base size of the surface itself. Each hemisphere is about 8 mb, and there are 4 surfaces per hemisphere. Gzip helps the size a bit, but there really aren't very many files in there. That's the minimal set of surfaces required to test all the mapping functionality in pycortex (admittedly, full testing for that functionality doesn't exist yet!)

well -- but then still -- if it is only for demo and testing, why the import of the module itself (without seeking for demo or testing) fails? ;)

One option is to use the sklearn model -- have it dynamically download the data from a server if that subject is requested. However, that requires special-casing subject "S1", which doesn't sound like a very good idea to me. You are right -- subject S1 is only for demo purposes and testing. All subject data can be your own. However, having an absolute filestore directory is a must, or else the database won't work. In fact, since our lab reuses subjects from experiment to experiment, we have a shared filestore on a network drive so everyone can access the same surfaces.

not really a solution but rather a convenience -- you also might like to add an ability to augment any non-specified or already specified config item via environment. e.g. for PyMVPA look https://github.com/PyMVPA/PyMVPA/blob/HEAD/mvpa2/base/config.py#L56 which allows us to temporarily or for some runs to point to another dataset etc

If you notice, I actually had to hack setup.py to have it install the filestore in a reasonable location. What's the best way to deal with a configuration file like that? How is that handled by other database packages?

config: better (from user's perspective) to stick to reasonable defaults and demand config file only for needed customizations. To say the truth I believe I haven't seen yet (not that it doesn't exist, or in general infeasible idea) a project which generates those config files on the fly or augment them upon "installation" customizing for provided paths.

as for data -- once again I haven't had seen an ultimate agreement on how to deal with vast data pieces alongside Python modules. nipy folks were brewing some data access abstraction -- seems have boiled down to http://dataprotocols.org/data-packages/ https://github.com/tryggvib/datapackage but oh well -- it is GPL so even if fit you well -- you COULD NOT USE IT! ;)

we have our own "ways" in PyMVPA (boils down also to just config file + env variables specification) leaving actual distribution to outside means (packages/tarballs/etc), and sklearn -- downloads from the web somewhere, for neurosynth we added data/ as a submodule and also assuming its distribution separately IIRC.

But I believe neither of those above require those big arrays for tests, only for demos. For testing of surface-based procedures in PyMVPA we (Nick) even generates them 'on the fly' to perform unittests, so no vast data needs to be provided for unittesting.

If you find a better approach that those (or just shipping within) -- please share.

btw -- now I remembered that I have posed similar conundrum to pysurfer guys a while back: https://github.com/nipy/PySurfer/issues/14 ... as far as I can see upon quick look they are just skipping all the tests now which would depend on fsaverage .

but if you think about it -- now you have ~60MB of data which you are shipping within the repo/distribution, which probably would change much less frequently than pycortex code itself... so may be it still makes sense to modularize data away from the project (into a separate project) one way or another and then pretty much assume some default location to look for it and provide means to specify paths to the data. If it could be useful for other projects as well especially it should get the life of its own.

Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik

jamesgao commented 10 years ago

Ok, I have a plan to address this issue. First, I will remove the example subject from the install package. On installation, no defaults are written. However, on first import, pycortex will prompt you to set your filestore location. It will optionally allow you to download the example subject into that location.

I don't think the example subject is very valuable, since just about any set of surfaces will work for testing. I would personally prefer to include something like fsaverage as the example subject, but I was unsure of whether it would be kosher to include it in the distribution. In fact, I was planning to recommend people import fsaverage on their own in the documentation.

I'm not sure how this will interface with automated testing. Will this work well for packaging?

yarikoptic commented 10 years ago

Anything would work for some packaging ;) but my comment was primarily about usability... e.g. IMHO it is nice when Python module is importable without requiring (at the import time) data and complete config files being available. May be someone would like to use some submodule's function in some other context, without 'filestore' per se?

What about providing a simple function to specify a common filestore location at run time and extending OpenFile with filestore option, e.g. so I could do following snippets

import cortex
ds = cortex.openFile("S1_retinotopy.hdf", filestore="filestore")

or

import cortex
cortex.setDefaultFilestore("filestore")
ds = cortex.openFile("S1_retinotopy.hdf")

and then internally getFilestore function could look into the config file to see if it was specified there in case it was not given at run time ?

this should (theoretically, since not sure if there is more of hidden dependencies on filestore inside) also allow me to not bother about 'filestore' at all but rather save (pickle or into hdf5 via PyMVPA's h5save) to be reloaded later on and provided to cortex.webshow without having any local filestore available

NB btw -- generally Python community moved away from camelCasing notation for methods/functions to use of camel_casing_underscores... for PyMVPA we had to go through a big RF after 0.4.x series to do that