genghisken / ps1

Pan-STARRS and ATLAS code
3 stars 0 forks source link

Fix duplicate name bug in creation of ATLAS names #148

Open genghisken opened 4 years ago

genghisken commented 4 years ago

While Pan-STARRS has a completely separate nameserver, the ATLAS database creates names internally. I recently discovered that this process is not "thread safe". In other words, if two people simultaneously promote objects, they can get exactly the same ATLAS name. This has happened 123 times so far - mostly affecting objects in the attic, but there are also 8 affected objects with only 3 unique names in the good list. A list of the RAs and Decs indicates that these are 8 completely different objects, not 3 sets of duplicates. In the list below, ATLAS19qgu objects all got the same tns_designation - but only one of these is correct.

The problem is almost certainly caused by the bulk list update code - and they're always flagged on the same day. It doesn't happen for the good list anymore because I stopped allowing bulk promotion to the good list in October 2019. The majority of the objects are AGNs, which are often bulk promoted to the attic.

+-------------------+----------+
| atlas_designation | count(*) |
+-------------------+----------+
| ATLAS19kvi        |        2 |
| ATLAS19qgu        |        4 |
| ATLAS19xyi        |        2 |
+-------------------+----------+

+-------------------+-----------+-----------+----------+
| atlas_designation | ra        | dec       | tns_name |
+-------------------+-----------+-----------+----------+
| ATLAS19kvi        | 325.72429 |  -0.29613 | NULL     |
| ATLAS19kvi        | 287.93135 |  51.29939 | NULL     |
| ATLAS19qgu        | 239.05897 |  36.49603 | 2019lwm  |
| ATLAS19qgu        | 271.72816 | -12.76895 | 2019lwm  |
| ATLAS19qgu        | 347.00564 |  -1.12503 | 2019lwm  |
| ATLAS19qgu        | 357.53405 |  36.82511 | 2019lwm  |
| ATLAS19qgu        |   2.25169 |  35.73886 | 2019lwm  |
| ATLAS19xyi        | 127.96812 |   9.22339 | NULL     |
| ATLAS19xyi        |  73.78066 |   4.30401 | NULL     |
+-------------------+-----------+-----------+----------+
genghisken commented 4 years ago

NOTE that creation of a unique index on that column is no longer an option - since we also reference old objects in the ATLAS3 database, some of which are indeed in the same position as several existing ATLAS4 objects (hence are assigned the same ATLAS3 name). The fix is to abstract out the naming process to a separate database (like with Pan-STARRS) - which will itself have a unique key.

genghisken commented 4 years ago

Intermediate fix:

  1. Do internal cone search of all objects within (e.g.) 2 arcsec radius. Is there one that already has an ATLAS name? If so, use it. So at least internal duplicates will get the same name! (There's another ongoing issue for merging objects.) Note that we already do dynamic cone searches to stop duplicates RA/Decs going into the good list.
  2. If cone search fails, generate new name. Immediately BEFORE EACH INSERT into the database CHECK that that name does not exist already. If it does, generate another new name. Keep going until a loop counter of n (= e.g. 10) retries is exhausted, in which abort the object update.
  3. One-off clean-up. Go through and collect the existing object names that have completely different coordinates. Generate new names for these. Feed a report to TNS so they can do a one-off fix.

Longer term fix: Generate new, separate table with the names, internal object IDs and mean coordinates. ATLAS names in THIS table ARE unique and enforced by a unique or primary key, which is much more efficient (and thread safe) than just doing the above read-before-insert check.