dendrograms / astrodendro

Generate a dendrogram from a dataset
https://dendrograms.readthedocs.io/
Other
37 stars 38 forks source link

Make datasets more easily loadable #61

Open ChrisBeaumont opened 11 years ago

ChrisBeaumont commented 11 years ago

This is what I had in mind for #60 -- instead of asking users to download files like the Perseus extinction map, why not bundle it with the code with an easy loader function?

astrofrog commented 11 years ago

One reason why I like being explicit is because we want users to be able to take the examples in the docs and apply it to their cases, which they can't really do if we are hard-coding specific examples with special methods. So I would personally prefer not to have this kind of convenience provided (to me, the data shouldn't be part of the package itself, it's an example file for the docs).

On the other hand, I think we could consider allowing a filename to be passed to compute, which would already simplify a number of examples.

ChrisBeaumont commented 11 years ago

Ok, you can make the final call here. I like being able to %paste code snippets in IPython, to quickly run examples -- as stupid as it is, an extra step like a data download has prevented me from trying examples before. I like how scikit-learn handles this stuff: http://scikit-learn.org/stable/auto_examples/applications/plot_outlier_detection_housing.html and https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/datasets/data

But it's just a preference, and I defer to you. Either way, I agree compute should do something sensible if given a fits path (i.e. load and use the first HDU)

astrofrog commented 11 years ago

@ChrisBeaumont - I'll look into it a bit more. In this case, the reason why I think copying and pasting verbatim isn't particularly useful is because the user won't learn much more than what's already in the docs. IMHO they are most likely to copy and paste and modify, which means they want to use it on their own files. But I want to take some time to think about it and have a look over the examples you show.

ChrisBeaumont commented 11 years ago

One likely use case where they won't care about using their own data is to get a handle on the data structure, the API, the impact of pruning parameters, etc. People may want to do that before they dive in with their own data. Just something to think about.

astrofrog commented 11 years ago

After thinking about this a bit more, I agree that something like this would be handy, though I think we need to ensure we keep some of the initial examples with the full loading, and that we also make sure we provide links from time to time to say 'if you want to load your own dataset, see X'. I can try and work on this this afternoon based on what you have here.

astrofrog commented 11 years ago

@ChrisBeaumont - could you rebase this, then I can open a pull request to your branch with suggestions.

ChrisBeaumont commented 11 years ago

ok, done

astrofrog commented 11 years ago

Oops, it looks like this will need rebasing again. Once you've done that, I'll open a pull request to your branch with suggestions - but in principle I now like this idea. I just think that the first time we show the example we should show the full thing and add a little paragraph explaining where the load functions are coming from.

astrofrog commented 11 years ago

@ChrisBeaumont - here are a couple of ideas:

from astrodendro import Dendrogram, examples
Dendrogram(examples + 'PerA_Extn2MASS_F_Gal.fits')

Then it shows more explicitly that the argument is a path. What do you think?

ChrisBeaumont commented 11 years ago

I don't like examples as a path string -- that isn't obvious from the import statement, and usually you import code, not data. Plus, the PerA_Extn... name is cumbersome.

What do you think about a function example_file('perseus') -> path_to_perseus_file? This, combined with the ability for compute to auto-load a fits file from a path, addresses the original motivation for this issue (make using example data easier and more readable)

astrofrog commented 11 years ago

Ok, that sounds like a reasonable compromise - do you want to update the examples to show this?

astrofrog commented 10 years ago

Summary of a chat we had at CfA with @ChrisBeaumont and @keflavich:

One option - the first time, show the full thing, then explain that in the rest of the docs, we use a convenience to replace the from astropy.io import fits and fits.getdata lines, then use the convenience in the other examples to avoid always repeating the I/O code.

We decided to delay this until after 0.1 release.