dfm / tess-atlas

MIT License
9 stars 8 forks source link

2min cadence data TOI list is incorrect #236

Closed avivajpeyi closed 1 year ago

avivajpeyi commented 2 years ago

We are caching a list of TOIs with 2min cadence data. However:

updated TOI cache not accounting for if LK present correctly

We seem to be not getting any TOIs with lightcurves after TOI 3000 in the updated TOI list ... This seems wrong -- previously we had some TOIs > TOI 3000 with 2-min cadence data.

Screen Shot 2022-07-28 at 12 17 34 pm

Why are the ~700 TOIs above TOI 3000 previously considered to have data now not have any data?

import matplotlib.pyplot as plt 
import pandas as pd

current_db  = pd.read_csv("https://raw.githubusercontent.com/dfm/tess-atlas/main/src/tess_atlas/data/cached_tic_database.csv")
old_db = pd.read_csv("https://raw.githubusercontent.com/dfm/tess-atlas/42de954ab7973b548ebb21ebacabe0afbe08c495/src/tess_atlas/data/cached_tic_database.csv")

cur = current_db[current_db["Lightcurve Availible"]==True]["TOI int"].tolist()
old = old_db[old_db["Lightcurve Availible"]==True]["TOI int"].tolist()

cur_mis = current_db[current_db["Lightcurve Availible"]==False]["TOI int"].tolist()
old_mis = old_db[old_db["Lightcurve Availible"]==False]["TOI int"].tolist()

# plot comparison
fig, ax = plt.subplots(2,1, sharex=True, sharey=True, figsize=(20, 2))
r = dict(ymin=0,ymax=2, lw=0.1)
ax[0].vlines(cur, **r, label=f"Current ({len(cur)}/{len(current_db)} TOIs)", color="tab:green")
ax[1].vlines(old, **r, label=f"Old ({len(old)}/{len(old_db)} TOIs)", color="tab:green")
ax[0].vlines(cur_mis, **r, color="tab:red")
ax[1].vlines(old_mis, **r, color="tab:red")
ax[0].set_ylim(0.99,1.01)
ax[0].set_yticks([])
ax[0].legend(loc="upper left",)
ax[1].legend(loc="upper left",)
ax[1].set_xlabel("TOIs with 2-min Lightcurve data", fontsize='xx-large')
ax[1].set_xlim(100, 5794)
dfm commented 2 years ago

Are you maybe getting your IP address blocked for making too many requests?

avivajpeyi commented 2 years ago

Yeah -- that may be the issue. Strange that no error was thrown.

Im checking the ~700 TOIs that we previously noted to have 2-min cadence data again on a different cluster.

Screen Shot 2022-07-28 at 1 51 56 pm

Hmm I think ill just revert the CSV to the old version

from tqdm.auto import tqdm
import lightkurve as lk

def toi_has_2min_cadence_lk(toi):
    search = lk.search_lightcurve(
                    target=f"TOI {toi}",
                    mission="TESS",
                    author="SPOC",
                )
    if len(search) > 0:
        # print(f"TOI {toi} has 2-min cadence lk data")
        return True 
    else:
        # print(f"TOI {toi} does not have 2-min cadence lk data")
        return False 

has_lk = []
missing_tois = list(set(old)-set(cur))
for missing_toi in tqdm(missing_tois):
    has_lk.append(toi_has_2min_cadence_lk(missing_toi))

recheck = pd.DataFrame({"TOI int":missing_tois, "Lightcurve Availible":has_lk})
Could not resolve TOI 4472 to a sky position.
Could not resolve TOI 4476 to a sky position.
Could not resolve TOI 5542 to a sky position.
Could not resolve TOI 5480 to a sky position.
Could not resolve TOI 5489 to a sky position.
Could not resolve TOI 5493 to a sky position.
Could not resolve TOI 5507 to a sky position.
Could not resolve TOI 5515 to a sky position.
Could not resolve TOI 5519 to a sky position.
Could not resolve TOI 5520 to a sky position.
Could not resolve TOI 5521 to a sky position.
Could not resolve TOI 5522 to a sky position.
Could not resolve TOI 5523 to a sky position.
Could not resolve TOI 5524 to a sky position.
Could not resolve TOI 5525 to a sky position.
Could not resolve TOI 5526 to a sky position.
Could not resolve TOI 5527 to a sky position.
Could not resolve TOI 5528 to a sky position.
Could not resolve TOI 5529 to a sky position.
Could not resolve TOI 5530 to a sky position.
Could not resolve TOI 5531 to a sky position.
Could not resolve TOI 5532 to a sky position.
Could not resolve TOI 5533 to a sky position.
Could not resolve TOI 5534 to a sky position.
Could not resolve TOI 5535 to a sky position.
Could not resolve TOI 5537 to a sky position.
Could not resolve TOI 5538 to a sky position.
Could not resolve TOI 5539 to a sky position.
Could not resolve TOI 5543 to a sky position.
Could not resolve TOI 5544 to a sky position.
Could not resolve TOI 5545 to a sky position.
Could not resolve TOI 5546 to a sky position.
Could not resolve TOI 5548 to a sky position.
Could not resolve TOI 5549 to a sky position.
Could not resolve TOI 5550 to a sky position.
Could not resolve TOI 5551 to a sky position.
Could not resolve TOI 5552 to a sky position.
Could not resolve TOI 5553 to a sky position.
Could not resolve TOI 5554 to a sky position.
Could not resolve TOI 5555 to a sky position.
Could not resolve TOI 5556 to a sky position.
Could not resolve TOI 5557 to a sky position.
Could not resolve TOI 5558 to a sky position.
Could not resolve TOI 5559 to a sky position.
Could not resolve TOI 5560 to a sky position.
Could not resolve TOI 5561 to a sky position.
Could not resolve TOI 5562 to a sky position.
Could not resolve TOI 5563 to a sky position.
Could not resolve TOI 5564 to a sky position.
Could not resolve TOI 5572 to a sky position.
Could not resolve TOI 5574 to a sky position.
Could not resolve TOI 5581 to a sky position.
Could not resolve TOI 5584 to a sky position.
Could not resolve TOI 5605 to a sky position.
Could not resolve TOI 5611 to a sky position.
Could not resolve TOI 5619 to a sky position.
Could not resolve TOI 5624 to a sky position.

I guess my concern is that we're missing excluding more TOIs that should be included.

avivajpeyi commented 1 year ago

@dfm -- do you know someone we can ask about how to query the list of TOIs (with 2-min cadence data) from ExoFOP?

dfm commented 1 year ago

This wouldn't explain the changing results, but it would be better to search with TIC ID rather than TOI number I expect, since the TIC IDs are fixed, and the TOI numbers will take some time to resolve.

avivajpeyi commented 1 year ago

Ok! I can set up the search to look for TIC ids with 2 min cadence data -- but is there a better way to query this than what we've been doing (Ie manually checking each candidate with a lk download call)?

dfm commented 1 year ago

I think your approach is sensible! These errors don't seem to be related to timeouts or limiting - it seems to be caused by name resolution, which isn't super surprising if you're querying on TOI!

avivajpeyi commented 1 year ago

I submitted an array-slurm job to check each TIC in parallel (#250, #251). I think this gave us the correct list of TIC with 2-min cadence data (see #254)

Now we just need to update the TIC list with new TICs (can be done with the CLI tool update_tic_cache)