astropy / astroquery

Functions and classes to access online data resources. Maintainers: @keflavich and @bsipocz and @ceb8
http://astroquery.readthedocs.org/en/latest/
BSD 3-Clause "New" or "Revised" License
706 stars 399 forks source link

Add VAMDC interface #618

Closed keflavich closed 7 years ago

keflavich commented 8 years ago

Probably only a limited variant to match the splatalogue query tool: http://portal.vamdc.eu/

keflavich commented 8 years ago

As a step along the way, and possibly the only one I'm interested in implementing, I'd like to be able to parse CDMS results into astropy tables. Here is an example query:

    import requests
    import bs4
    url = 'http://cdms.ph1.uni-koeln.de/cdms/tap/'
    rslt = requests.post(url+"/sync", data={'REQUEST':"doQuery", 'LANG': 'VSS2', 'FORMAT':'XSAMS', 'QUERY':"SELECT SPECIES WHERE MoleculeStoichiometricFormula='CH2O'"})
    bb = bs4.BeautifulSoup(rslt.content, 'html5lib')
    h = [x for x in bb.findAll('molecule') if x.ordinarystructuralformula.value.text=='H2CO'][0]
    tem_, Q_ = h.partitionfunction.findAll('datalist')
    tem = [float(x) for x in tem_.text.split()]
    Q = [float(x) for x in Q_.text.split()]

So the first priority is implementing a CDMS table parser. @vilhelmp, I think you might also be interested in this?

vilhelmp commented 8 years ago

Yes, this would be nice indeed.

vilhelmp commented 8 years ago

After working a bit with Holger Muller (guy behind http://www.astro.uni-koeln.de/cdms), I realize that it might also be good to have an interface to the "normal" cgi-bin POST interface. If they update any files in the database, it is through the web interface (i.e. http://www.astro.uni-koeln.de/cgi-bin/cdmssearch) which all the updates are accessible first. The VAMDC comes later, they have to do some manual updating for that to happen.

Search: I've been trying to figure out the relevant POST request (using Live HTTP headers Chrome plugin).

Result tables: The results are in fixed-width tables with the same format as the JPL molecular line catalog (http://spec.jpl.nasa.gov/ftp/pub/catalog/README) where the format is given as a Fortran (fixed width) format specifier. For reading the tables, the obvious go to one would be Astropy tables with format='fixed_width_no_header' (see http://stackoverflow.com/questions/35018200/reading-table-data-card-images-with-format-specifier-given-into-python?noredirect=1#comment57809951_35018200). (an alternative is the old package FortranFormat, but adding another required package...) It could be good idea to write a short translation tool that would take a Fortran format specifier e.g. "(F13.4,F8.4, F8.4, I2,F10.4, I3, I7, I4, 6I2, 6I2)" and translate that into Astropy Table fixed width reader "col_starts" and "dtype" input.

Anyway, just some thoughts on this.

keflavich commented 8 years ago

:+1: @vilhelmp, this is the best approach for astroquery, at least until we integrate vamdclib into astroquery (which I hope we can eventually do).

vilhelmp commented 8 years ago

The shortest way I figured out to get the CDMS text results into Astropy Table format is the following:

from astropy.table import Table
import astropy.constants as c
import astropy.units as u
cdms_colnames = ('FREQ', 'ERR', 'LGINT', 'DR',  'ELO', 'GUP', 'TAG', 'QNFMT',  'QN1',  'QN2',     'SPECIES')
cdms_colstarts = (0,       13,   24,  35,    37,  47,  50, 57, 61,   72, 89 )
lines = Table.read('cdms_table_file.tab',
            format='ascii.fixed_width_no_header',
            names=cdms_colnames,
            col_starts=cdms_colstarts,
             )

and then proceed to parse the units

lines['FREQ'] = lines['FREQ'] * 1e-3
lines['FREQ'].unit = u.GHz
lines['ERR'] = lines['ERR'] * 1e-3
lines['ERR'].unit = u.GHz
lines['ELO'].unit = u.cm**-1
lines['ELO'] = (lines['ELO'].quantity*c.c*c.h/c.k_B).decompose()
lines['ELO'].unit = u.K

# calculate the E_up (in Kelvin) from the E_low (in Kelvin)
lines['EUP'] = lines['ELO'] + ((c.h * lines['FREQ'].quantity)/c.k_B).decompose()

Here I have to keep track of the units a bit more than usual, is there a way to get it to just use the unit that is calculated? (i.e. lines['ELO'] = (lines['ELO'].quantity*c.c*c.h/c.k_B).decompose() will just give it units of Kelvin)?

bsipocz commented 7 years ago

Closing this one as an experimental VAMDC module has beed added in #658. Please feel free to reopen if you think otherwise.