LCOGT / mop

Microlensing Observation Portal
GNU General Public License v3.0
0 stars 7 forks source link

Associate duplicated events in MOP's DB and implement custom Target model #162

Closed rachel3834 closed 3 months ago

rachel3834 commented 4 months ago

Surveys with overlapping regions, such as OGLE and MOA, can independently detect the same event.
Currently MOP handles these as distinct events and makes no attempt to cross-match them.
This is going to prevent us upgrading to the current and future versions of the TOM Toolkit (v2.18.4 upwards) which include more sophisticated target-matching tools.

This includes a number of tasks:

rachel3834 commented 3 months ago

Since the target matching features of the TOM Toolkit have been released in the same version as custom Target models, it makes sense to make these changes during one migration.

I've implemented a custom MicrolensingTarget model, together with management commands customized to migrate all MOP's extra parameters to model fields.

I've also updated MOP's UI to use the new custom Target model, together with all of MOP's cronjob tasks, views and associated test code.

rachel3834 commented 3 months ago

In the process of migrating the production database, it became clear that the 20 characters allowed for the observing_mode field is too short and should be changed to max_length=30.

rachel3834 commented 3 months ago

Experience showed that it was easiest to convert to the new Target model and then do the merging of duplicate events.
For the dev DB the duplicates have been removed, along with their extra params but the conversion of the extra_params should be re-run.

rachel3834 commented 3 months ago

Some targets had lightcurve ReducedDatums with an unknown source specified; these need to be re-uploaded.

Mal-formed coordinates:

rachel3834 commented 3 months ago

This seems to be a wide-spread problem and mostly seems to refer to data uploaded by the pipeline, which at some early stage presumably didn't specify the data source. Wrote a separate command tool to review and update the affected data products before continuing with the duplicate event identification.

rachel3834 commented 3 months ago

Attempting to run the management command to resolve the duplicates, the process (which was previously running fine) is now being Killed immediately on start up. Log reports:

mop-6568bf569-cv7cf mop PRIORITYTARGETS started context get 2024-06-23 00:47:47.339771 mop-6568bf569-cv7cf mop CHECKPOINT: N DB connections: 0, memory: 669.29MiB mop-6568bf569-cv7cf mop /usr/local/lib/python3.10/site-packages/django/views/generic/list.py:91: UnorderedObjectListWarning: mop-6568bf569-cv7cf mop mop-6568bf569-cv7cf mop Pagination may yield inconsistent results with an unordered object_list: <class 'tom_targets.models.TargetExtra'> QuerySet. mop-6568bf569-cv7cf mop mop-6568bf569-cv7cf mop PRIORITYTARGETS finished context get 2024-06-23 00:47:47.523179 took 0:00:00.183408 mop-6568bf569-cv7cf mop CHECKPOINT: N DB connections: 0, memory: 669.29MiB mop-6568bf569-cv7cf mop FINISHED GET_CONTEXT 2024-06-23 00:47:47.529885 mop-6568bf569-cv7cf mop CHECKPOINT: N DB connections: 0, memory: 669.29MiB mop-6568bf569-cv7cf nginx 10.100.2.229 - - [23/Jun/2024:00:47:47 +0000] "GET /prioritytargets/ HTTP/1.1" 200 5823 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" "216.177.189.132" mop-6568bf569-cv7cf mop 127.0.0.1 - - [23/Jun/2024:00:47:47 +0000] "GET /prioritytargets/ HTTP/1.1" 200 5823 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" mop-6568bf569-cv7cf mop 127.0.0.1 - - [23/Jun/2024:00:47:50 +0000] "GET /targets/ HTTP/1.1" 200 3697546 "https://mop.lco.global/prioritytargets/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" mop-6568bf569-cv7cf nginx 2024/06/23 00:47:50 [warn] 7#7: 29 an upstream response is buffered to a temporary file /var/cache/nginx/proxytemp/5/00/0000000005 while reading upstream, client: 10.100.24.119, server: , request: "GET /targets/ HTTP/1.1", upstream: "http://127.0.0.1:8080/targets/", host: "mop.lco.global", referrer: "https://mop.lco.global/prioritytargets/" mop-6568bf569-cv7cf nginx 10.100.24.119 - - [23/Jun/2024:00:47:50 +0000] "GET /targets/ HTTP/1.1" 200 3697546 "https://mop.lco.global/prioritytargets/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" "216.177.189.132" mop-6568bf569-tq6w7 nginx 2024/06/23 00:48:12 [warn] 7#7: 31 an upstream response is buffered to a temporary file /var/cache/nginx/proxytemp/3/00/0000000003 while reading upstream, client: 10.100.2.229, server: , request: "GET /targets/ HTTP/1.1", upstream: "http://127.0.0.1:8080/targets/", host: "mop.lco.global", referrer: "https://mop.lco.global/prioritytargets/" mop-6568bf569-tq6w7 mop 127.0.0.1 - - [23/Jun/2024:00:48:12 +0000] "GET /targets/ HTTP/1.1" 200 3697544 "https://mop.lco.global/prioritytargets/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" mop-6568bf569-tq6w7 nginx 10.100.2.229 - - [23/Jun/2024:00:48:12 +0000] "GET /targets/ HTTP/1.1" 200 3697544 "https://mop.lco.global/prioritytargets/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36" "65.79.192.4"

For context, the previous run of the command crashed when it reached a target with mal-formed coordinates.

rachel3834 commented 3 months ago

TNS class and TNS name are set by query at the time of target ingest but are set unreliably because TNS has a low max query limit. These parameters are not merged during the deduplication process since they should be the same for all matching targets; however they may or may not have been set in the first place.

Similarly, the Gaia catalog query is assumed to be the same for merged targets, but since this code was implemented fairly recently, not all targets have this information.

For consistency, it would be best to have some way to re-run these functions for all targets systemtically.

rachel3834 commented 3 months ago

Gaia20fsn/ZTF20aculhzd - example of ZTF with dataproducts with no source name which are therefore not merged properly? Code adjusted to recognize a wider range of source_names such as ZTFDR3 as well as ZTF

rachel3834 commented 3 months ago
rachel3834 commented 3 months ago

Target post save hook: MOA-2023-BLG-352 created: False Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in execute return self.cursor.execute(sql, params) psycopg2.errors.InternalError: invalid memory alloc request size 1073741824

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mop/./manage.py", line 21, in main() File "/mop/./manage.py", line 17, in main execute_from_command_line(sys.argv) File "/usr/local/lib/python3.10/site-packages/django/core/management/init.py", line 442, in execute_from_command_line utility.execute() File "/usr/local/lib/python3.10/site-packages/django/core/management/init.py", line 436, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 412, in run_from_argv self.execute(*args, cmd_options) File "/usr/local/lib/python3.10/site-packages/django/core/management/base.py", line 458, in execute output = self.handle(*args, *options) File "/mop/mop/management/commands/associate_duplicate_events.py", line 95, in handle self.merge_extra_params(primary_target, matching_targets) File "/mop/mop/management/commands/associate_duplicate_events.py", line 166, in merge_extra_params primary_target.store_parameter_set(update_params) File "/mop/microlensing_targets/models.py", line 248, in store_parameter_set self.save() File "/usr/local/lib/python3.10/contextlib.py", line 79, in inner return func(args, kwds) File "/usr/local/lib/python3.10/site-packages/tom_targets/base_models.py", line 422, in save super().save(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/django/db/models/base.py", line 814, in save self.save_base( File "/usr/local/lib/python3.10/site-packages/django/db/models/base.py", line 877, in save_base updated = self._save_table( File "/usr/local/lib/python3.10/site-packages/django/db/models/base.py", line 990, in _save_table updated = self._do_update( File "/usr/local/lib/python3.10/site-packages/django/db/models/base.py", line 1054, in _do_update return filtered._update(values) > 0 File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 1231, in _update return query.get_compiler(self.db).execute_sql(CURSOR) File "/usr/local/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1984, in execute_sql cursor = super().execute_sql(result_type) File "/usr/local/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 67, in execute return self._execute_with_wrappers( File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers return executor(sql, params, many, context) File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute with self.db.wrap_database_errors: File "/usr/local/lib/python3.10/site-packages/django/db/utils.py", line 91, in exit raise dj_exc_value.with_traceback(traceback) from exc_value File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute return self.cursor.execute(sql, params) django.db.utils.InternalError: invalid memory alloc request size 1073741824

rachel3834 commented 3 months ago

Further investigation into the out of memory errors revealed that in a small number of cases (3 in >11,000) the fit_covariance parameter contained malformed data. Specifically, these were found to have extremely long strings (over 200 million characters!) consisting almost entirely of backslashes (an exception character); these parameters alone require > 100MB, so naturally the process ran out of resources.

This parameter is populated by the model fitting process, but such an output has never been seen to be produced in the normal operation of pyLIMA, so it is hard to understand where these excessive strings came from. Alternatively something may have gone wrong during the conversion of this parameter, which was previously stored as a string extra_param of variable length to the new JSONField model attribute.

To resolve this issue, I wrote a management command that review all of the entries for this parameter for all targets, including better exception handling, and ensured that all entries were properly converted to a JSON dictionary.

rachel3834 commented 3 months ago
rachel3834 commented 3 months ago

Excluding NaN values from Django queries turns out to be non-trival. These values are stored as FloatFields and there doesn't seem to be any built-in support for excluding these during queries. The nearest reference I can find in the Django forums indicated a previous issue to prevent NaNs being stored for DecimalFields but not FloatFields.

The best approach seems to be to intercept NaNs in the code and reset these values to the default of zero.

This requires that the priorities be re-calculated for all targets in the DB. This doesn't normally happen, since run_TAP only operates on alive events outside the HCZ, so there are many events outside of this range that essentially have old values left over from the merge and migration. So I've implemented a command to enable me to run this calculation for arbitrary selections of events.

rachel3834 commented 3 months ago

The migration of the DB and the merging of duplicate targets is now complete; all code in MOP has been updated to use the new custom Targets.