HajimeKawahara / exojax

🐈 Automatic differentiable spectrum modeling of exoplanets/brown dwarfs using JAX, compatible with NumPyro and JAXopt
http://secondearths.sakura.ne.jp/exojax/
MIT License
45 stars 14 forks source link

Some docs questions #84

Closed bmorris3 closed 2 years ago

bmorris3 commented 3 years ago

Hi @HajimeKawahara,

Thanks for your excellent work on this toolkit! I'm beginning to experiment with it, and I have a couple of questions that I couldn't answer with the docs or the paper in my first reads:

  1. Units: when running AutoRT.spectrum, what are the units of the spectrum?
  2. Exoplanets: it's clear that this tool is brilliant for e.g. high resolution brown dwarf spectroscopy in the NIR, but I'm wondering if it's currently applicable to low resolution exoplanet spectroscopy in the optical, where there are many many more spectral lines e.g. from H2O to compute, and most of the resolution information is lost due to low spectral resolution. Are there any shortcuts that we can take to apply this toolkit to retrievals of exoplanet emission spectra, for example?
  3. RAM optimization: One of the bottlenecks in (2) is loading the line list into memory, where there are >10k lines in a narrow bandpass. Apart from setting crit or limiting the range of wavelengths in the calculation, are there any steps we can take to minimize the memory pressure?
  4. Species for Hot Jupiters: thinking out loud here, could we implement H- opacity as well? I'd be happy to help contribute to this, and would be grateful if you could give me some pointers on how to start.

Thanks again!

HajimeKawahara commented 3 years ago

Hello @bmorris3,

Thank you for your interest and relevant questions for exojax!

  1. AutoRT.spectrum uses piBarr in it. So the unit should be (erg/s/cm2/cm-1). Note that AutoRT and AutoXS are for a quick use. The autodiff function is not currently unavailable for them.

  2. Good point. I think you cannot use the current version of exojax for low-res spectrum retrieval due to computation cost. To overcome this , I have started to implement a much faster method for huge number of lines (N>10000) using jax, which was originally proposed by @dcmvdbekerom and @erwanp in radis. This is not for low-res but for a wide range of high-res, but potentially applicable to low res too hopefully. Another direction is to implement auto differentiable correlated k. But, I have no idea for this.

  3. very good point. Actually, we have started to apply exojax to high-res spectra w/ ExoMol methane lines (so many lines). Currently @ykawashima (Yui Kawashima) is developing a branch with smaller RAM. Actually, there are some overlap use of memory for the trans info. The current version uses pandas to unzip bz2 and convert them to numpy. If you have any good ideas, it would be great to share them with us! updated: #89

  4. Wow, great! Currently, the classes for the continuum opacity is put in contdb. Yui suggested me some papers for H- John 1988, Bell and Berrington 1987, and this paper although I haven't read them yet. I think jax.numpy.interp can be used for auto-differentiable version of H- as similar to spec.hitrancia.logacia.

Thanks!

bmorris3 commented 3 years ago

Thanks so much for this quick answer!

For reference in case anyone else is looking for units, I found this line in the comparison notebooks which clarified (1) for me a bit:

# after computing spectrum Fx0 = rtrun(...) 
ccgs=29979245800.0 
Fx0=Fx0/ccgs #convert to the unit of erg/cm2/s/Hz

I've done a silly, over-complicated confirmation that this unit makes sense by supplying a T-P profile, using H2O opacity in a narrow bandpass, computing the emission spectrum with rtrun, plotting the contribution function with plotcf, and compared the emission spectrum from a simple Planck emission spectrum (implemented via astropy, times pi) with a blackbody temperature corresponding to the temperature where the contribution function is maximized in this narrow bandpass. The emission from exojax agrees with the pi * Planck(T).

erwanp commented 3 years ago

Hello ! Regarding 3., this is also a major problem encountered in RADIS, and we currently have an on-going experimental project lead by @gagan-aryan to release RAM pressure. It mostly involve lazy-loading and lazy-computation of the database using librairies like Vaex.

I don't know yet how it will transfer to Exojax and reducing GPU Memory pressure. You probably have thought of similar strategies for Helios-K @exoclime .

bmorris3 commented 3 years ago

Another thing (I'm about to check this on my own, but I wanted to check with you for details): if you supply a nu grid that is discontiguous, i.e. for a spectrum spanning two different wavelength segments with a gap in between, would that pose any problems in your implementation of the emission spectrum model? I'm wondering if I should create two separate spectra with contiguous nu grids instead, but it would be cleaner to do, for example:

wav = np.concatenate([np.linspace(6000, 7000, 1000), np.linspace(7500, 8000, 1000)])
HajimeKawahara commented 3 years ago

@bmorris3 Hi. I believe that the separate spectra with contiguous grids works same as the usual grid. Note that one needs to specify nu range in MdbExomol as the form of [nu_min, nu_max].

MdbExomol(path, nurange=[-inf, inf])

There is a bit inconsistency for this input. I sometimes use nu grids as input directly, but it does not work for two or more files. I need to fix it. updated: #85

bmorris3 commented 3 years ago

In this example, you call the following line:

kernel = NUTS(model_c,forward_mode_differentiation=True)

It looks like switching to reverse mode differentiation (setting to forward_mode_differentiation=False) causes a shape mismatch. Is this because, as you say in a footnote in the paper:

Currently, exojax does not support VJP

or do shapes need to be broadcasted differently when forward_mode_differentiation=False?

bmorris3 commented 3 years ago

Another hint seems to be that when I remove sources of line opacity and only compute CIA, the shape issues go away.

HajimeKawahara commented 3 years ago

Yes, exactly. Exojax uses the custom-defined JVP in hjert in lpf module to speed up the computation. But, it breaks the shape match for the reverse mode. I haven't figured the reason.

For this reason, we cannot use hjert in lpf for the optimization. Instead, tentatively, I defined a VJP version of lpf, rlpf for this purpose. See also this, though its ugly.

related issues: #67

bmorris3 commented 2 years ago

Hi @HajimeKawahara, sorry for the long stretch of silence. I'm returning to this project now!

I'm getting the following error when I use the dev version of exojax which relates to the vaex updates:

----> 1 mdbTiO=moldb.MdbExomol('.database/TiO/48Ti-16O/Toto/', nus_kepler, crit=1e-16)

~/git/exojax/src/exojax/spec/moldb.py in __init__(self, path, nurange, margin, crit, bkgdatm, broadf)
     93         #load states
     94         if self.states_file.with_suffix(".bz2.hdf5").exists():
---> 95             states=vaex.open(self.states_file.with_suffix(".bz2.hdf5"))
     96             ndstates=vaex.array_types.to_numpy(states)
     97         else:

//anaconda3/envs/pymc/lib/python3.8/site-packages/vaex/__init__.py in open(path, convert, shuffle, fs_options, fs, *args, **kwargs)
    223                 ds = vaex.dataset.open(path_output, fs_options=fs_options, fs=fs, **kwargs)
    224             else:
--> 225                 ds = vaex.dataset.open(path, fs_options=fs_options, fs=fs, **kwargs)
    226             df = vaex.from_dataset(ds)
    227             if df is None:

//anaconda3/envs/pymc/lib/python3.8/site-packages/vaex/dataset.py in open(path, fs_options, fs, *args, **kwargs)
     85         raise IOError(f'Cannot open {path}, failures: {failures}.')
     86     else:
---> 87         raise IOError(f'Cannot open {path} nobody knows how to read it.')
     88 
     89 

OSError: Cannot open .database/TiO/48Ti-16O/Toto/48Ti-16O__Toto.states.bz2.hdf5 nobody knows how to read it.

Any tips on how to resolve this? Thanks!

bmorris3 commented 2 years ago

Nevermind! I fixed this by restarting my interactive session and running the same code again, then it worked fine.