astropy / halotools

Python package for studying large scale structure, cosmology, and galaxy evolution using N-body simulations and halo models
http://halotools.rtfd.org
99 stars 63 forks source link

download_processed_halo_table fails to find all catalogs #365

Closed cwhite1026 closed 8 years ago

cwhite1026 commented 8 years ago

I'm trying to follow the instructions on the Downloading and caching Halotools-provided catalogs page on readthedocs and can't seem to download any of the catalogs at all.

The code:

#Set things up to download catalogs.
from halotools.sim_manager import DownloadManager
dman = DownloadManager()
halo_direc = "/user/caviglia/research/correlations/halotools_catalogs/"

#Download a catalog
dman.download_processed_halo_table('bolshoi', 'rockstar', 0.0, download_dirname = halo_direc) 

This spits out the following (in iPython):

/Users/caviglia/ssbvirt/ssbx-osx/lib/python2.7/site-packages/beautifulsoup4-4.4.1-py2.7.egg/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

---------------------------------------------------------------------------
HalotoolsError                            Traceback (most recent call last)
<ipython-input-8-d6fc2c670477> in <module>()
----> 1 dman.download_processed_halo_table('bolshoi', 'rockstar', 0.0, download_dirname = halo_direc)

/Users/caviglia/ssbvirt/ssbx-osx/lib/python2.7/site-packages/halotools-0.0a0-py2.7-macosx-10.6-x86_64.egg/halotools/sim_manager/download_manager.pyc in download_processed_halo_table(self, simname, halo_finder, redshift, dz_tol, overwrite, version_name, download_dirname, ignore_nearby_redshifts, **kwargs)
    166             msg += "version_name = " + version_name + "\n"
    167             msg = msg + "There are no halo catalogs meeting your specifications"
--> 168             raise HalotoolsError(msg)
    169 
    170         url, closest_redshift = (

HalotoolsError: You made the following request for a pre-processed halo catalog:
simname = bolshoi
halo_finder = rockstar
version_name = halotools_alpha_version1
There are no halo catalogs meeting your specifications

This happens no matter what I ask for. I've tried "bolshoi", "rockstar" with redshifts 0, 0.5, 1.0, and 1.5; "bolplanck", "rockstar", and redshifts 1 and 1.5; "multidark", "rockstar" and redshift 0.5.

I also tried running the download_initial_halocat.py script, but it fails thus:

ptcl_version_name = sim_defaults.default_ptcl_version_name
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-33-3ec7e7b43cee> in <module>()
----> 1 ptcl_version_name = sim_defaults.default_ptcl_version_name

AttributeError: 'module' object has no attribute 'default_ptcl_version_name'

If I manually run everything other than that line (since I don't want the particle catalog anyway), I get the same HalotoolsError:

HalotoolsError: You made the following request for a pre-processed halo catalog:
simname = bolshoi
halo_finder = rockstar
version_name = halotools_alpha_version1
There are no halo catalogs meeting your specifications

Any help would be greatly appreciated!

System info: Mac OS X 10.8.5 Python 2.7.5 IPython 2.0.0 Halotools 0.0a0 BeautifulSoup 3.2.1 Numpy 1.9.1, Scipy 0.15.1

aphearin commented 8 years ago

Not sure about the first set of errors, but I'll look into it. As for the download script, it looks like you need to install the latest version. The reason I think this is that older versions did not have ''default_ptcl_version_name''. Can you try updating to the latest master and reinstalling with 'python setup.py install'?

cwhite1026 commented 8 years ago

Unfortunately, I don’t have root access on the computer that generated the original error so I can't do python setup.py install there.

However, I built from source on my laptop with the current github repo. py download_initial_halocat.py and the code snippet both give:

/Users/cathyc/anaconda/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))

... Downloading data from the following location: 
http://www.astro.yale.edu/aphearin/Data_files/halo_catalogs/bolshoi/rockstar/hlist_1.00035.list.halotools_alpha_version2.hdf5

 ... Saving the data with the following filename: 
/Users/cathyc/.astropy/cache/halotools/halo_catalogs/bolshoi/rockstar/hlist_1.00035.list.halotools_alpha_version2.hdf5

---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-5-05ac53ec9751> in <module>()
----> 1 dman.download_processed_halo_table('bolshoi', 'rockstar', 0.0)

/Users/cathyc/anaconda/lib/python2.7/site-packages/halotools-0.1.dev3482-py2.7-macosx-10.5-x86_64.egg/halotools/sim_manager/download_manager.pyc in download_processed_halo_table(self, simname, halo_finder, redshift, dz_tol, overwrite, version_name, download_dirname, ignore_nearby_redshifts, **kwargs)
    269 
    270         start = time()
--> 271         download_file_from_url(url, output_fname)
    272         end = time()
    273         runtime = (end - start)

/Users/cathyc/anaconda/lib/python2.7/site-packages/halotools-0.1.dev3482-py2.7-macosx-10.5-x86_64.egg/halotools/utils/io_utils.pyc in download_file_from_url(url, fname)
     47         sys.stdout.flush()
     48 
---> 49     urllib.urlretrieve(url, fname, reporthook)
     50 
     51 

/Users/cathyc/anaconda/lib/python2.7/urllib.pyc in urlretrieve(url, filename, reporthook, data, context)
     96     else:
     97         opener = _urlopener
---> 98     return opener.retrieve(url, filename, reporthook, data)
     99 def urlcleanup():
    100     if _urlopener:

/Users/cathyc/anaconda/lib/python2.7/urllib.pyc in retrieve(self, url, filename, reporthook, data)
    247             headers = fp.info()
    248             if filename:
--> 249                 tfp = open(filename, 'wb')
    250             else:
    251                 import tempfile

IOError: [Errno 2] No such file or directory: u'/Users/cathyc/.astropy/cache/halotools/halo_catalogs/bolshoi/rockstar/hlist_1.00035.list.halotools_alpha_version2.hdf5'

The halo_catalogs directory is in fact nonexistent- halotools/ contains only halo_table_cache_log.txt and ptcl_table_cache_log.txt. I did a mkdir but it says I don't have permission. I'd be happy to strong-arm it if you like, but I'd rather not muck anything up accidentally.

My laptop setup thus: Mac OS X 10.11.3 Python 2.7.11 Anaconda 2.4.1 Halotools 0.1.dev3482 I can't seem to import BeautifulSoup despite it working enough to raise an warning Numpy 1.10.1, Scipy 0.16.1

(Incidentally, importing halotools in this build gives a warning)

/Users/cathyc/anaconda/lib/python2.7/site-packages/IPython/kernel/__init__.py:13: ShimWarning: The `IPython.kernel` package has been deprecated. You should import from ipykernel or jupyter_client instead.
  "You should import from ipykernel or jupyter_client instead.", ShimWarning)
aphearin commented 8 years ago

The traceback you first posted is different from the one you posted second, and by following the second traceback it looks like you might not have specified the download directory the second time around, but I'm not sure. Just to be sure, if you could reproduce the exact code snippet you used to produce the second traceback, that would help me get to the bottom of the problem.

Also, can you clarify, when you say you built from source on your laptop, what does this mean? Do you mean that on your laptop you did python setup.py install, or did you just python setup.py build?

Ok, I'll take a look at this in the morning. Thanks for your patience, it's problems like this why the current code still has the "alpha" stamp on it.

In the meantime, BeautifulSoup is a core dependency for this feature, so if that's not installed then this could be the problem. You'll need to either pip install beautifulsoup4 or conda install beautifulsoup4 in order for the DownloadManager class to work properly.

cwhite1026 commented 8 years ago

You are correct on multiple counts- I didn't specify the download directory and I forgot to do python setup.py build. I tried it again after installing beautifulsoup and building and it works now on my laptop. I'll see if I can get IT to put the current version on my desktop at work.

Thank you for all your help. I really appreciate it and I raised the issue partly because I thought that you would want to know what was going on. Worst case scenario, I can just ftp the catalogs from my laptop to the desktop.

aphearin commented 8 years ago

Sorry for the delay, @cwhite1026. I'm glad you got to the bottom of the problem.

This is exactly the sort of thing that GitHub Issues are for, so I'm glad you brought this to my attention. Even though there didn't turn out to be a bug in the normal sense of the word, if you found something in my documentation to be the origin of the confusion, I'd call that a "bug" and would like to fix it. Could you let me know one way or the other if there is confusing language (or just missing language) in the docs that helped trigger this issue? If not, that's ok, just feel free to close the issue in that event. But I'll leave this open for the time being to give you a chance to comment.

In other news, since you work on galaxy clustering, I thought I'd point you to a new section in the documentation that you may find useful.

http://halotools.readthedocs.org/en/latest/quickstart_and_tutorials/tutorials/catalog_analysis/galcat_analysis/index.html

I'm currently in the process of actively adding to this, so there is more to come in short order, but if you want a painless way to learn how to use Halotools to calculate galaxy clustering, what's already there is probably be sufficient.

cwhite1026 commented 8 years ago

I think the main thing that went wrong was that I was trying to use the version of halotools that's on pip. The IT guys changed halotools on my desktop to the latest version of halotools and the downloads are working just fine. Maybe put a note at the top of the "Downloading Catalogs" page saying that the version on pip doesn't work? The version of the code that the documentation corresponds to is in the top left corner, but I didn't think to check it.

The secondary problem was that I didn't build before installing on my laptop. I checked the "Package Installation" page and I think python setup.py build is missing there. Not having BeautifulSoup was on me- you have it clearly listed as a dependency and I didn't have it installed properly.

The last thing I can think of is a little table of all the options for simname and what options are available for the halo_finder and redshifts. I found the list of simulations, but if there's a similar list of halo finders, I missed it.

Thank you again for all of your help and for making this code available. I was really dreading implementing an HOD model and halotools makes it much less intimidating. Thanks for the clustering tutorial link as well. That's my project for today- I have angular correlation functions on the CANDELS catalogs and I want to fit them with an HOD. (If you have any suggestions for this more sophisticated than "get the CFs from a bunch of HODs with different parameters and see which one fits best," I would love to hear them. :-P)

aphearin commented 8 years ago

@cwhite1026 - the version of Halotools that's up on pip is entirely empty of code. All I did was reserve the name on pip in anticipation of the official code release. My logic behind doing this was so that people who want to use the code before it's released will have to make some concrete step to remind them that the code is not polished and ready for out-of-the-box science. So by requiring a manual install, I hoped to reinforce that perspective. On the Getting Started page, I have a note saying that pip install halotools is Coming Soon, which I was hoping would cover all the bases, but maybe I should also put a note in the README?

As for python setup.py build, I think we had a miscommunication here. When I asked whether you had done this, I was just taking a guess at how you might have interpreted the instructions on the documentation, I was not making a suggestion for how to proceed. In fact, the only thing that python setup.py build does is to compile the code into a build subdirectory of the halotools working directory. So this was a red herring: running python setup.py build was not the solution to the problem.

It's super useful to hear about your experience of what aspects of the code documentation you did not find "discoverable", e.g., the simname, halo_finder, etc. I'll be sure to resolve #390 before the beta-release.

I have been in contact with @duncandc about the angular correlation function calculation. The code is ready for use, it's just not well-documented yet. When we address this in #376 , we'll tag you in that Issue so that you'll be aware of the update to the instructions as soon as they are available. Thanks for being patient.

As for your HOD parameter scanning question, I would recommend using one of the functions in scipy.optimize in order to find a best-fit parameter combination. That will be both simpler and more effective than a trial-and-error approach.

If you have any more comments about this issue, please let me know. I'll leave this issue open in case you do. If you don't, would you mind closing the issue when you get the chance?

cwhite1026 commented 8 years ago

Sounds good!

As for the pip installation, I think it might be helpful to have a note in the description that pip displays or have it print a little message on import if that's possible. My default mode of package installation is through pip since it's the only way I can install things on my work computer. So, when I was told about halotools, the first thing I did was search pip for it and there it was.

Thank you again for all of your help!