MickaelRigault / ztfquery

Access ZTF data from Python
Apache License 2.0
37 stars 18 forks source link

NightSummary returns list index out of range #12

Closed MichaelMedford closed 5 years ago

MichaelMedford commented 5 years ago

Running the example for night summary provided on the Readme still works. However running later dates of the night summary returns a list with an index out of range.

from ztfquery import query
data = query.NightSummary('20190101')

results in

IndexError                                Traceback (most recent call last)
<ipython-input-1-1f6d9dfa80b3> in <module>()
      1 from ztfquery import query
----> 2 data = query.NightSummary('20190101')

~/miniconda3/lib/python3.6/site-packages/ztfquery/query.py in __init__(self, night, ztfops_auth)
    638         self.night = night
    639 
--> 640         self.data_all  = download_night_summary(night, ztfops_auth=ztfops_auth)
    641 
    642         self.data  = self.data_all[self.data_all["type"]=="targ"]

~/miniconda3/lib/python3.6/site-packages/ztfquery/query.py in download_night_summary(night, ztfops_auth)
    591     summary = requests.get(_NIGHT_SUMMARY_URL+"%s/exp.%s.tbl"%(night,night),
    592                                auth=ztfops_auth).content.decode('utf-8').splitlines()
--> 593     columns = [l.replace(" ","") for l in summary[0].split('|') if len(l.replace(" ",""))>0]
    594     data    = [l.split() for l in summary[1:] if not l.startswith('|') and len(l)>1]
    595     dataf   = DataFrame(data=data, columns=[l if l!= "fil" else "fid" for l in columns])

IndexError: list index out of range

I also know that this is not an authentication issue because I have repeated the attempt using

from ztfquery.query import download_night_summary
data = download_night_summary('20190101', ztfops_auth=(username, password))

and ended up with the same results.

MickaelRigault commented 5 years ago

Hi Michael,

It's been a long time I did not use NightSummary. Something may have changed. I'll look at it today.

MichaelMedford commented 5 years ago

There are two issues arising.

First is when the author name has a space in it, such as "Richard Deka". The split() command adds an extra element into each of these rows, making the total number of columns in that particular row one more than the columns used to initiate the dataframe.

Second is when there is a missing fileroot in the summary. The split() skips over the fileroot column for that row, making the total number of columns in that particular row one less than the columns used to initiate the dataframe.

I came up with the following hack that works, although it is not at all generalized:

def download_night_summary(night, ztfops_auth = None):
    """ 
    Parameters
    ----------
    night: [string]
        Format: YYYYMMDD like for instance 20180429

    ztfops_auth: [string, string] -optional-
        Provide directly the [username, password] of the ztfops page.
    """
    import requests
    from pandas import DataFrame
    # = Password and username
    if ztfops_auth is None:
        from .io import _load_id_
        ztfops_auth = _load_id_("ztfops", askit=True)

    summary = requests.get(_NIGHT_SUMMARY_URL+"%s/exp.%s.tbl"%(night,night),
                               auth=ztfops_auth).content.decode('utf-8').splitlines()
    columns = [l.replace(" ","") for l in summary[0].split('|') if len(l.replace(" ",""))>0]
    # data    = [l.split() for l in summary[1:] if not l.startswith('|') and len(l)>1]

    # Hacky Loop
    data = []
    for i, line in enumerate(summary):
        # Same continue condition as in the list comprehension
        if line.startswith('|') and len(line)>1: continue
        # PI cannot have a space in the name, this is the one I keep seeing
        line = line.replace('Richard Deka', 'RichardDeka')
        d = line.split()
        # Fileroot cannot be dropped from file. This assumes that a missing column
        # is always due to a missing fileroot.
        if len(d) < len(columns):
            d = d[:-2] + ['None'] + d[-2:]
        data.append(d)
    dataf   = DataFrame(data=data, columns=[l if l!= "fil" else "fid" for l in columns])
    dataf["fid"][dataf["fid"]=="4"] = "3"
    return dataf
MickaelRigault commented 5 years ago

Thanks for looking closely into it. I'm at a conference right now so I haven't had the time to find a solution, but want you made looks great. I'll be back into it asap.

MichaelMedford commented 5 years ago

I have come up with a solution and ran a pull request title Night summaries to daraframes through indexer. Please check it out and let me know what you think.

MickaelRigault commented 5 years ago

Thanks for doing this Michael