LSST-nonproject / sims_maf_contrib

Contributed code for MAF (sims_maf)
18 stars 46 forks source link

TransientAsciiMetric data problem #50

Closed MeganNewsome closed 2 years ago

MeganNewsome commented 7 years ago

The transientAsciiMetric is intended to make running lightcurves through the simulations database and MAF straightforward by allowing use of ASCII files to mirror how the initial transientMetric works. Here are a couple of problems I have run into in trying to run this:

  1. In mafContrib/testTransientAsciiMetric.py, changing the ASCII file to another user's data proves to be difficult. I tried initially running it with this file input for the asciifile and immediately there was a problem with the column numbers. Trying to force np.genfromtxt to skip the first few lines of comments did no good. My quick fix was by copying the data into my own .dat file without any of the comments, which seemed to get around the column numbers issue just fine, but produced this one error:

[mnewsome@ryloth.lco.gtn mafContrib]$ python testTransientAsciiMetric.py 1.0 1.0 0.666666666667 0.5 0.0 0.0 0.5 1.0 0.0 0.0 0.5 1.0 0.0 0.0 0.5 1.0 0.0 0.0 Traceback (most recent call last): File "testTransientAsciiMetric.py", line 102, in <module> result = transMetric.run(dataSlice) File "/home/mnewsome/envs/lsst_maf/lsst4/maf_tests/sims_maf_contrib/mafContrib/transientAsciiMetric.py", line 227, in run lcDetect[lcN] = False IndexError: index 2 is out of bounds for axis 0 with size 2

Line 102 of testTransientAsciiMetric.py is just the last "version" of running the lightcurve in the test with certain parameters. Line 227 of transientAsciiMetric.py is the following: lcEpochAboveThresh = lcEpoch[le:ri][np.where(lcAboveThresh[le:ri])] # If we did not get enough detections before preT, set lcDetect to False. timesPreT = np.where(lcEpochAboveThresh < self.preT)[0] if len(timesPreT) < self.nPreT: lcDetect[lcN] = False continue I would like to note that it is still getting through the first 9 iterations (I believe it is fine that the numbers do not match up, as the second column was intended for another lightcurve) and again that this error only happened for a different lightcurve than the one the tutorial came with.

  1. In science/transients, running TransientAsciiMetric.ipynb using only the called-upon data file (2013_ab1.dat) works all throughout the program if you leave the year equal to 9. However, changing that year causes some problems when we try to change the metric parameters from their standard: metric = TransientAsciiMetric(asciiLC, surveyDuration=1, detectSNR={'u': 5, 'g': 5, 'r': 5, 'i': 5, 'z': 5, 'y': 5}, nPreT=3, preT=5, nFilters=3, filterT=30, nPerLC=2, peakOffset=0, dataout=False)

as the following error occurs: /home/mnewsome/envs/lsst_maf/lsst4/maf_tests/sims_maf_contrib/science/Transients/transientAsciiMetric.pyc in run(self, dataSlice, slicePoint) 225 timesPreT = np.where(lcEpochAboveThresh < self.preT)[0] 226 if len(timesPreT) < self.nPreT: --> 227 lcDetect[lcN] = False 228 continue 229 # If we did not get detections over enough sections of the lightcurve, set lcDtect to False.

`IndexError: index 2 is out of bounds for axis 0 with size 2``

We found that this happened when inputting year 0, 1, 2, 3, 5, and 7. Years 4, 6, 8, and 9 work fine without having any index errors, for some reason.

I find it very interesting that in either case, and even when using the user-suggested lightcurve file, this index error keeps occurring. I am still researching what it may that would cause this problem - I thought perhaps it was pickiness about the data file but now I am not sure why it wouldn't work just by inputting different years.

MeganNewsome commented 7 years ago

Played around with it this morning. Since the issue is with indices and array sizes, the problem due to how the data being input not acting as the code expects it to at certain points. I printed out a few parameters and sure enough, the parameters around the area of the bump are arrays of 2 elements. I think the issue is at ulcNumber:

('ulcNumber=', array([0, 2]))
('lcLeft:', array([ 0, 169]))
('lcRight:', array([169, 235]))

The following shows some of the other "ulcNumber" outputs in comparison to the one above, which is also printed at the end here:

('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2, 3]))
('ulcNumber=', array([0, 1, 2, 3]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 1, 2, 3]))
('ulcNumber=', array([0, 1, 2]))
('ulcNumber=', array([0, 2]))

The entire set of ulcNumbers up to that point have NONE which are arrays with only 2 elements. There are some that are just 1D elements of "0," and there are many with 3 and 4 elements going from 0 to 3 like above. So this is the only case in which there are only 2 elements in the first place. Furthermore, this is the only case in which a value is skipped in indices. There is no "1" between 0 and 2; instead, there is a "2" in index 1. This is, ultimately, funneling down to be the problem. The line "lcDetect[lcN] = False" cannot even be run because the indices are not matching.

I have put in a fix with the following:

                if len(timesPreT) < self.nPreT:
                    try:
                        lcDetect[lcN] = False
                    except IndexError:
                        print("lcNumber was not continuous for tshift = %f" % (tshift))
                        lcDetect[np.where(lcDetect == lcN)] = False
                    continue

This circumvented the problem by essentially brute-forcing it to look for only matching indices and ignoring it otherwise, just so it may continue. This worked perfectly on the first dataset that "came with" the program, so to speak. So, we put it to the test for our own data file. In this case, we used the file SNIa_MS2Msunprogenitor_maxinteraction_g.dat, although we copied the data from it and put it in our own file, data.dat, so that it would skip the comments at the top and run fine.

The fix still worked - but this data caused a problem later in the code, this time due to "index 8 is out of bounds for axis 0 of size 2" - overall, another index error. So, I took the above "fix" and put it at every iteration of the "for" loop just to cover all bases for index errors.

for lcN, le, ri in zip(ulcNumber, lcLeft, lcRight):
                # If there were no observations at all for this lightcurve:
                if le == ri:
                    lcDetect[lcN] = False
                    # Skip the rest of this loop, go on to the next lightcurve.
                    continue
                lcEpochAboveThresh = lcEpoch[le:ri][np.where(lcAboveThresh[le:ri])]
                # If we did not get enough detections before preT, set lcDetect to False.
                timesPreT = np.where(lcEpochAboveThresh < self.preT)[0]
                if len(timesPreT) < self.nPreT:
                    try:
                        lcDetect[lcN] = False
                    except IndexError:
                        print("lcNumber was not continuous for tshift = %f" % (tshift))
                        lcDetect[np.where(lcDetect == lcN)] = False
                    continue
                # If we did not get detections over enough sections of the lightcurve, set lcDtect to False.
                phaseSections = np.unique(np.floor(lcEpochAboveThresh / self.transDuration * self.nPerLC))
                if len(phaseSections) < self.nPerLC:
                    try:
                        lcDetect[lcN] = False
                    except IndexError:
                        print("lcNumber was not continuous for tshift = %f" % (tshift))
                        lcDetect[np.where(lcDetect == lcN)] = False
                    continue
                # If we did not get detections in enough filters, set lcDetect to False.
                lcFilters = dataSlice[le:ri][np.where(lcAboveThresh[le:ri])][self.filterCol]
                if len(np.unique(lcFilters)) < self.nFilters:
                    try:
                        lcDetect[lcN] = False
                    except IndexError:
                        print("lcNumber was not continuous for tshift = %f" % (tshift))
                        lcDetect[np.where(lcDetect == lcN)] = False
                    continue
                # If we did not get detections in enough filters within required time, set lcDetect to False.
                if (self.filterT is not None) and (self.nFilters > 1):
                    xr = np.searchsorted(lcEpochAboveThresh, lcEpochAboveThresh + self.filterT, 'right')
                    xr = np.where(xr < len(lcEpochAboveThresh) - 1, xr, len(lcEpochAboveThresh) - 1)
                    foundGood = False
                    for i, xri in enumerate(xr):
                        if len(np.unique(lcFilters[i:xri])) >= self.nFilters:
                            foundGood = True
                            break
                    if not foundGood:
                        try:
                            lcDetect[lcN] = False
                        except IndexError:
                            print("lcNumber was not continuous for tshift = %f" % (tshift))
                            lcDetect[np.where(lcDetect == lcN)] = False
                        continue

This completely works. I will note, though, that this produces OVER 1,000 instances of "lcNumber was not continuous for tsfhift = 0.000000" which seems really excessive. If this problem is THAT pervasive, how much data is effectively being deemed useless by this program?

rhiannonlynne commented 2 years ago

Due to various issues around the transientASCIIMetric and unexpected issues with users's light curves, we eventually decided not to continue using the transientASCIIMetric, in preference for adding the light curves to distinct points in the slicer.