astroconda / astroconda-contrib

Community Submitted Conda Recipes
12 stars 38 forks source link

How packages make it into stsci metapackage? #397

Open pllim opened 6 years ago

pllim commented 6 years ago

@tddesjardins asked why some packages like synphot, stsynphot, and webbpsf don't get installed by default using conda create -n astroconda stsci.

jhunkeler commented 6 years ago

Should they be? Is there a functional difference between synphot, stsynphot and pysynphot? Would adding the other packages to stsci cause confusion among end-users (i.e generate tickets asking why there are three synphot packages)?

If I remember correctly, webbpsf was removed from stsci because it was huge and not everyone was going to use it. We opted not to bloat the environment size by several hundred megabytes and just let users install webbpsf whenever they needed it. (cc @mperrin)

pllim commented 6 years ago

Should they be?

Maybe @tddesjardins can comment on this.

tddesjardins commented 6 years ago

I'm fine with leaving off webbpsf. My question was more to do with the synphot stuff as it seems like we're moving towards using synphot and stsynphot over pysynphot. At least, that was the direction I got from @pllim and Harry (sorry, don't know his username!).

mperrin commented 6 years ago

You remember correctly! webbpsf-data is about 350 MB (and used to be even larger in some earlier versions) so we decided not to make that part of the default. Some people thought we were taking up disk space unnecessarily for something they wouldn't use. It's easy enough to conda install it individually if you do want it, so there did not seem to be a substantial down side to making it an optional install.

mperrin commented 6 years ago

synphot would seem to be a similar case, since it relies on various potentially large data files (libraries of stellar atmospheres, etc) which I believe are also many hundreds of MB.

PS Incidentally I too find the multiple versions of *synphot to be confusing and arguably user-hostile. Yes I understand there's historical reasons, but it's not a great situation in the long run...

pllim commented 6 years ago

synphot would seem to be a similar case

Not really. Data files are managed separately by RedCat and not distributed with the package.

cc @hcferguson for other discussions.

tddesjardins commented 6 years ago

Correct me if I'm wrong, though, the file dependencies for *synphot are not downloaded through conda, correct? You have to go to the CRDS pages and download the reference file data for those.

stscicrawford commented 6 years ago

I'm also helping RedCat to take a look at how to host those files -- it might be something to think about for Webbpsf as well. @mperrin -- should I open a separate issue in Webbpsf? While easy to install, it might be useful to have it part of the jwst pipeline with the option of grabbing the files if needed.

pllim commented 6 years ago

are not downloaded through conda, correct?

Correct! And in an ideal world, you only download what you need.

mperrin commented 6 years ago

@stscicrawford Thanks, but actually for WebbPSF we have an effective solution already. The webbpsf-data conda package is a lightweight wrapper for retrieving the .tar.gz file with the data and storing it as part of someone's conda environment. In this case we don't need finer granularity of that, and doing it this way also allows us to manage the versioning consistently for the code and data files.

mperrin commented 6 years ago

Which is to say, I'm not opposed to some alternative way of providing or hosting the data files for webbpsf, if it's useful for some other reason. But right now I don't see any clear need that would drive that as a priority.

tddesjardins commented 6 years ago

I guess let's reverse the question and ask if the *synphot files should be managed via conda similar to the webbpsf-data package? We have been having this issue of how best to obtain the files from CALSPEC etc.

pllim commented 6 years ago

:scream: (backs away)

stscicrawford commented 6 years ago

@tddesjardins That is currently what I am looking at and investing different options for hosting the files and making it easier for them to be downloaded. We are still in the scoping stage, and I've been more looking at how the data is stored and versioned. Please feel free to send me your thoughts on how you'd like to access these data sets.

jhunkeler commented 6 years ago

Three main reasons why *synphot data was never turned into Conda package(s):

1) The data is not versioned. The tarballs are replaced on the server whenever new data becomes available. That's not something we can work with. 2) A single change requires a total repack of the data set 3) Eventually our channel would contain a lot of very large dead packages no one will ever touch.

This has been discussed on numerous occasions with different people since 2015. webbpsf's data releases are infrequent and relatively small.