Closed iugrina closed 8 years ago
The auxillary files are primarily intended for use in the notebooks. At this point, analysis is wrapped into the notebook. Over the course of the project, there has been an evolution in the best way to call these functions within a notebook (command line utils vs imported functions). There has also been an evolution in the best enviroment and package management approach.
The installation described in #199 is reflective of the current conda install. As far as I can tell from the repeated research, conda doesn't easily support pythonpath modifications. The best suggestion I've seen is modifying a .pth file, which has its own set of challenges. Therefore, its necessary to include a setup.py and install the repository using pip if you wish to have the auxillary code work on the enviroment.
If you're using another environment manager (virtualenv, for instance) which lets you modify the pythonpath, its preferable to modify the path and pythonpath.
Thank you for the reply.
I've tried it now with conda (instructions from #199) and it still doesn't work. Folders data
, latex
and tests
are not installed as a part of the package (if that was the intention) and running 01-get_sequences_and_metadata.md
as an ipython notebook with AG_TESTING=True gives
study_accessions = agenv.get_study_accessions()
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-3-4ce98b7f14da> in <module>()
----> 1 study_accessions = agenv.get_study_accessions()
/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/notebook_environment.pyc in get_study_accessions()
2256 """
2257 if ag.is_test_env():
-> 2258 _stage_test_accessions()
2259 return _TEST_ACCESSIONS[:]
2260 else:
/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/notebook_environment.pyc in _stage_test_accessions()
2318 sourced from EBI.
2319 """
-> 2320 repo = get_repository_dir()
2321 for acc in _TEST_ACCESSIONS:
2322 src = os.path.join(repo, 'tests/data/%s' % acc)
/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/results_utils.pyc in get_repository_dir()
55
56 # get_path verifies the existance of these directories
---> 57 get_path(expected, 'data')
58 get_path(expected, 'latex')
59
/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/results_utils.pyc in get_path(d, f)
46 """Check and get a path, or throw IOError"""
47 path = os.path.join(d, f)
---> 48 check_file(path)
49 return path
50
/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/util.pyc in check_file(f, e)
146 """Verify a file (or directory) exists"""
147 if not os.path.exists(f):
--> 148 raise e("Cannot continue! The file %s does not exist!" % f)
149
150
IOError: Cannot continue! The file /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/data does not exist!
Therefore, IMHO the problem isn't in conda vs pip. Since americangut is installed as a package get_repository_dir
will obviously miss the correct repo dir with data/tests/latex folders. The only way I see get_repository_dir
finding the correct repo dir is if it is sourced from American-Gut/ameriacngut/results_utils.py
(not from the package). However, this way 01-get_sequences_and_metadata.md
won't know about it since American-Gut repo isn't in the PYTHONPATH and therefore it will import the package version.
I would like to help with improving this (making it more reproducible, working on different platforms, ...) but I need to know what was the intended way to run it. An example from scratch would help a lot with comments on following question:
Thanks, Ivo. Data and latex are intended to be part of the repo. I recommend looking at what is done via travis.yml. I admit, our internal uses just clone the repo so having setup.py is a bit confusing. However, we'd be excited to see install/deploy improve On Mar 10, 2016 12:39 PM, "Ivo Ugrina" notifications@github.com wrote:
Thank you for the reply.
I've tried it now with conda (instructions from #199 https://github.com/biocore/American-Gut/issues/199) and it still doesn't work. Folders data, latex and tests are not installed as a part of the package (if that was the intention) and running 01-get_sequences_and_metadata.md as an ipython notebook with AG_TESTING=True gives
study_accessions = agenv.get_study_accessions()
IOError Traceback (most recent call last)
in () ----> 1 study_accessions = agenv.get_study_accessions() /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/notebook_environment.pyc in get_study_accessions() 2256 """ 2257 if ag.is_test_env(): -> 2258 _stage_test_accessions() 2259 return _TEST_ACCESSIONS[:] 2260 else: /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/notebook_environment.pyc in _stage_test_accessions() 2318 sourced from EBI. 2319 """ -> 2320 repo = get_repository_dir() 2321 for acc in _TEST_ACCESSIONS: 2322 src = os.path.join(repo, 'tests/data/%s' % acc) /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/results_utils.pyc in get_repository_dir() 55 56 # get_path verifies the existance of these directories ---> 57 get_path(expected, 'data') 58 get_path(expected, 'latex') 59 /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/results_utils.pyc in get_path(d, f) 46 """Check and get a path, or throw IOError""" 47 path = os.path.join(d, f) ---> 48 check_file(path) 49 return path 50 /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/util.pyc in check_file(f, e) 146 """Verify a file (or directory) exists""" 147 if not os.path.exists(f): --> 148 raise e("Cannot continue! The file %s does not exist!" % f) 149 150 IOError: Cannot continue! The file /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/data does not exist! Therefore, IMHO the problem isn't in conda vs pip. Since americangut is installed as a package get_repository_dir will obviously miss the correct repo dir with data/tests/latex folders. The only way I see get_repository_dir finding the correct repo dir is if it is sourced from American-Gut/ameriacngut/results_utils.py (not from the package). However, this way 01-get_sequences_and_metadata.md won't know about it since American-Gut repo isn't in the PYTHONPATH and therefore it will import the package version. I would like to help with improving this (making it more reproducible, working on different platforms, ...) but I need to know what was the intended way to run it. An example from scratch would help a lot with comments on following question: - Are data, latex and tests folders intended to be a part of the package or just a part of the repo? — Reply to this email directly or view it on GitHub https://github.com/biocore/American-Gut/issues/204#issuecomment-195036851 .
Thanks. If it is intended to be used only as a repo then adjusting PYTHONPATH and PATH should be enough.
Resolved with #211
Hi,
I've been struggling with American-Gut repo and the way I should use it for the past few days. If I understood correctly the repo is broken into a package ('americangut' dir) and auxiliary files. Some of these files are intended to be used by the package itself while others are for interactive sessions with ipython notebooks for example.
In (https://github.com/biocore/American-Gut/issues/199) @JWDebelius recommends installing the package with
pip install . -e --no-deps
. Therefore,americangut
dir indeed was intended to be used as a package. Still, this will not install folderslatex
andtests
from package_data since setup.py seems to be a bit mis-configured (package_data should be a part of src dir of the package).Also, running (e.g.)
01-get_sequences_and_metadata.md
will fail onstudy_accessions = agenv.get_study_accessions()
since it callsget_repository_dir
(fromresults_utils.py
) which will strangely take a part of the full path (outside of the package dir) and will try to find 'data' and 'latex' there. Moreover, 'data' isn't even specified in the setup.py.Therefore, I'm not quite sure how should I use the repo. Should I define PYTHONPATH to include the repo and PATH to include scripts without installing the package or should I install the package (as recommended by @JWDebelius). If I need to install it, what else do I need to adjust to make it work (PATHs, PYTHONPATHs, ...)?