cosmodesi / pypower

power spectrum and window function estimation
BSD 3-Clause "New" or "Revised" License
17 stars 6 forks source link

write higher level wrapping code #3

Closed pmcdonal closed 2 years ago

pmcdonal commented 2 years ago

I thought about trying to use this and looked at the example notebooks, but what I see I'd call ... "significantly incomplete." I.e., it describes how one could use what I'd call an interior kernel of a power spectrum measurement, but it looks like there remains a lot of work to do to assemble a "usable work flow" bridging catalogs to fitting. Just to try to explain what I mean: in https://github.com/cosmodesi/DESIclustools/blob/main/analysis/Python/PowerMeasurement.py I define a class that you call for when you want a particular measurement. It represents the results, including power estimate, window, and covariance, but if it can't read them pre-cached from disk it knows how to make calls to produce them (at least that is the idea... some of the code, especially covariance, is not great).

class PowerMeasurement(freezable.freezable):
    def __init__(self,survey,outdirec=filemanager.default_results_dir(),
                 tracer='LRG',captag=None,ztag=None,
                 compressdk=None,nowindow=False,poweronly=False,Nran=18,
                 inparams={},mocktype=None):

So you only need to give it a modest number of inputs and it knows how to do the rest. E.g., "survey" might be "DESISV3" (yes, probably this should be split to "survey" and "DA" (data assembly), but I'm trying not to fix things that don't really need to be fixed). mocktype=None is real data, but "fgmocks" would give FirstGen. You get the point. There are a lot of things in here. I'm writing this here to "be provocative". Maybe you'd say this doesn't belong in "pypower," but it belongs somewhere. I'm "complaining" instead of just doing it myself because... nobody ever likes my code - if I do it myself I have learned to expect it to be ignored and re-written independently, so it seems smarter for me to wait... although this is frustrating (i.e., my doing it myself would be to implement it as a "back end" for my PowerMeasurement here, but if I will need to throw this out in a month to use some official version... that won't be very efficient). (As I'm thinking about hitting submit I'm asking myself "am I being a baby here, maybe I should just do it" e.g., I can re-use my file-locating-type infrastructure, so it is really "just" putting together the pieces to go from catalogs into pypower and then pypower outputs into the format I use downstream... but when you get into it there are a lot of details - always more than quick "this will be easy" thinking accounts for... so I guess I will hit submit)

adematti commented 2 years ago

Hi, I'd indeed say this doesn't belong in "pypower", but somewhere. I like to keep things clearly separate - base code without any dependence in the DESI workflow, to be called in some DESI pipeline. Personally, this really helps me to keep separate i) issues in the code / ii) issues in the DESI workflow. Having people starting with base codes (which are already fairly high level) is good to have more eyes on these codes. Finally, the base codes can be reused for something else than standard analyses, which is also good to get more feedback. I'll be trying my best to keep the interface stable. Still, I totally hear what you say about putting together a "usable workflow" (to be used by anyone), which I haven't managed to do myself, so I leave that to those who are better than me to produce consensus codes ;) - and rather try to give hints about "how to integrate this code into your own workflow", with notebooks or small scripts.

adematti commented 2 years ago

(Of course please do not change any of your codes if your current setup helps you make rapid progress about how the analysis should be run!)

pmcdonal commented 2 years ago

There is a level where I would be happy with this position (that the DESI-specific code should be elsewhere), but I don't really think it is quite where you are at. Every power spectrum measurement (we would be thinking about) will take a catalog and randoms in RA,DEC,z and produce not just summary statistics but also window, and I would say some baseline analytic error estimate, so I think having pypower go between those points would be appropriate. Again, I would be pretty happy if I could write the definitive "pycompletepowermeasurement", wrapping pypower to do what I say here (then that could be wrapped in DESI-specific stuff), but as a practical matter no one is going to like how I do it, so... I will muddle forward, but just wanted to write this as food for thought.

adematti commented 2 years ago

Hum, but the goal (and there actually is) already something to compute the window function, feeding in the power spectrum estimation. Though I agree with the spirit that (estimator, window, covariance) all go together, the covariance estimate will probably go into another code, as some people are working on it --- though we should make sure it does the right thing given our estimator. Still, I agree having a baseline error estimate, Gaussian + window effects would be nice; do you have the formulas written somewhere for the FFT-based estimator? (I know there are some generic papers, but I'd like to read/write formulas for our very estimator). Would you like some function that computes all these at one, in one call? Given the provided classes (CatalogFFTPower and CatalogFFTWindow), that would straightforward to write for estimator + window. But one would not want to run window or covariance estimation systematically?

pmcdonal commented 2 years ago

(Of course please do not change any of your codes if your current setup helps you make rapid progress about how the analysis should be run!)

Well, I was looking because it seems like the work you are doing to test and improve the details, along with the fact that other people will only want to do work based on your version, are inevitably going to make me want to switch to yours. As I find myself wanting to do all the same stuff independently... it doesn't make much sense. This would fit fine as a backend in the structure I have... I just ran into this feeling that there is this intermediate layer that it would help to more formally standardize.

pmcdonal commented 2 years ago

I wrote this below to try to clarify (as usual thinking "what I'd really do is follow my nose as things develop"), but... probably I will just try to plug it into what I have at some point (it just seems like there are a bunch of pieces of this that every user will repeat).

I want an object that you construct by feeding it catalog and random file names, and maybe necessary fiducial cosmological model stuff, i.e., standard inputs, and then you can call for estimated multipoles and window... and covariance... What it returns for covariance is a detail (it would be great if someone else's analytic covariance could be computed here eventually, and one obtained somehow from many mocks, if only, e.g., read from a supplied file name, would eventually fit here too... I have some 0-order implementation which I could work to put in if this is the only missing element). I say file names instead of already loaded catalogs because I inevitably find it useful to be able to defer as much calculation as possible until it is really needed. And for workflow it should be possible to have this run either by actually computing these things or reading from disk (where inevitably I find reasons to want to partially read/partially compute).

pmcdonal commented 2 years ago

Ok, never mind. The code really seems quite spectacular, once I got over a tiny potential barrier. (not that I don't still think what I was saying would make it a little better, but I'm so happy to be connected to what is here that I don't really care anymore)