AnalyzeTimeSeries - Githubissues

awhoward commented 8 months ago

This PR is for an analysis class used to ingest data from headers and telemetry extensions and make time series plots. RVs are not currently ingested, but they will be in a future version.

I posted a tutorial on Readthedocs to show how it works. See that page for examples.

This code is functional, but I plan to continue to develop it. I'm submitting a PR now for a few reasons. First, I'd like to get feedback generally and on a few specific points. I'm hoping that some combination of @howardisaacson, @bjfultn, @shalverson, and others can weigh in. Some of this feedback may impact the development path. Second, I'd like to deploy this to make Jump plots in the next week or two so that it's ready for the service mission that starts on Feb. 1. Third, others may want to contribute, and merging with develop opens this up.

I've also started to put standard time series plots on Jump, through this isn't automated yet. As conceived right now, there are time series plots for every day, month, year, and decade. An example of the monthly plots is here. I expect that we'll develop many more plots for Jump, which will be easy with this framework. Some work on Jump itself is also needed; currently I put the plots in the "Masters" subdirectories in the QLP, which makes them appear in the Masters tabs on Jump. This isn't quite right and I'd like to develop similar tabs in that Jump view for "Telemetry" and perhaps "Observing". Also, note that some of the plots currently on Jump don't have data in them because the relevant header keywords aren't being routinely written to the headers. This should be fixed soon with an update to the main recipe.

Another question is how to generate plots for Jump in production. I propose that a variation on the scripts/ingest_kpf_tsdb.py (new with this PR) be written to trigger ingestion on file modification events (as Jump-rake does). A separate process make the daily/monthly/yearly/decadely plots. The latter could be triggered by a cron job or using a timer. I favor the latter approach, but don't have strong feelings.

After we get experience with this, we will likely want to ingest additional keywords into Jump so that we can make interactive plots there. Thus, the code in this PR is supposed to be for development and for construction of static plots (some of which are displayed on Jump) and is only partially overlapping in capabilities with Jump.

One other design question worth discussing is which data are ingested and how the database column names are stored in the code. So far, I've just hand-selected the L0, 2D, and L1 header keywords that seem useful. I couldn't find any L2 keywords worth ingesting so there's just a placeholder keyword in that part of the code. All of the telemetry keywords (~100 of them) are ingested; specifically, the code ingests the "average" value of those keywords and not the standard deviations. I'm sure that we'll want to add more. As BJ pointed out yesterday, we have header definition files for the L0/L1/L2 files in kpfpipe/models/metadata/KPF_headers_L?.csv. Let's discuss whether AnalyzeTimeSeries should use those CSV files. The CSV files have the advantage of being a central repository, but I think they're old (all of the recent keywords related to QC and Diagnostics aren't in them) and the KPF_headers_L0.csv isn't used for making the L0 files anyway (I belive) because the L0 files are made at WMKO by the L0 Assembler, not this pipeline. Thoughts on this?

Finally, we should consider what database software to use. I adopted SQLite3 because it's an easy Python import. This means that KPF users can make a small DB on their laptop to understand some dataset, without having to use a central installation like Jump. I also like SQLite3 because it doesn't require root access and a special installation (like PostgreSQL). But it's less efficient and is more sensitive to corruption. In practice, I haven't seen any DB corruption and the performance is fast enough (I spend a while optimizing ingestion). But I welcome thoughts on this topic and we could reconsider SQLite3.

awhoward commented 8 months ago

@bjfultn -- I may be reading this incorrectly, but it looks like the CI error (pasted below) is unrelated to the code that I'm adding. The error below is related to the Masters processing. There are also some CI errors related to pg_config, which again seems unrelated to my commits. Am I misreading this?

[MasterArclampFramework][INFO]:FitsHeaders constructor: n_input_fits_files = 23
[MasterArclampFramework][INFO]:FitsHeaders constructor: n_input_fits_files = 23
[MasterArclampFramework][DEBUG]:FitsHeaders.match_headers_string_lower(): matched_fits_files = ['/data/2D/20230730/KP.20230730.05917.61_2D.fits', '/data/2D/20230730/KP.20230730.05976.68_2D.fits', '/data/2D/20230730/KP.20230730.06035.50_2D.fits']
[MasterArclampFramework][DEBUG]:FitsHeaders.match_headers_string_lower(): matched_fits_files = ['/data/2D/20230730/KP.20230730.05917.61_2D.fits', '/data/2D/20230730/KP.20230730.05976.68_2D.fits', '/data/2D/20230730/KP.20230730.06035.50_2D.fits']
[MasterArclampFramework][INFO]:obsdate = 20230730
[MasterArclampFramework][INFO]:obsdate = 20230730
[MasterArclampFramework][INFO]:masterbias_path_exists = False
[MasterArclampFramework][INFO]:masterbias_path_exists = False
[MasterArclampFramework][ERROR]:Failed executing primitive MasterArclampFramework: 'KPFDB' object has no attribute 'cur'
Traceback (most recent call last):
  File "/code/KPF-Pipeline/kpfpipe/primitives/core.py", line 28, in apply
    self.output = self._perform()
  File "/code/KPF-Pipeline/modules/master_arclamp/src/master_arclamp_framework.py", line 184, in _perform
    dbh.get_nearest_master_file(obsdate,cal_file_level,contentbitmask,cal_type_pair)
  File "/code/KPF-Pipeline/database/modules/utils/kpf_db.py", line 130, in get_nearest_master_file
    self.cur.execute(query)
AttributeError: 'KPFDB' object has no attribute 'cur'

[MasterArclampFramework][ERROR]:Failed executing primitive MasterArclampFramework: 'KPFDB' object has no attribute 'cur'
Traceback (most recent call last):
  File "/code/KPF-Pipeline/kpfpipe/primitives/core.py", line 28, in apply
    self.output = self._perform()
  File "/code/KPF-Pipeline/modules/master_arclamp/src/master_arclamp_framework.py", line 184, in _perform
    dbh.get_nearest_master_file(obsdate,cal_file_level,contentbitmask,cal_type_pair)
  File "/code/KPF-Pipeline/database/modules/utils/kpf_db.py", line 130, in get_nearest_master_file
    self.cur.execute(query)
AttributeError: 'KPFDB' object has no attribute 'cur'

bjfultn commented 8 months ago

This looks like an issue that has already been fixed. I just updated this branch from develop so hopefully that fixes it.

awhoward commented 8 months ago

I added an ingestion script (scripts/ingest_watch_kpf_tsdb.py) that watches the data directories and ingests data as files are written and modified. At present, it writes to a DB file at /data/kpf/time_series/kpf_ts.db. I'm working on integrating this into Jump.

howardisaacson commented 8 months ago

This is a great utility! Having these time series plots auto generated on a monthly timescale is, I think, the current most valuable plot. Ingesting RVs will be a great addition. I will work on the the notebook describing how to extract RVs.

On Sun, Jan 14, 2024 at 11:30 AM Andrew Howard @.***> wrote:

I added an ingestion script (scripts/ingest_watch_kpf_tsdb.py) that watches the data directories and ingests data as files are written and modified. At present, it writes to a DB file at /data/kpf/time_series/kpf_ts.db. I'm working on integrating this into Jump.

— Reply to this email directly, view it on GitHub https://github.com/Keck-DataReductionPipelines/KPF-Pipeline/pull/777#issuecomment-1891049623, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKFHJYIBTRU2BQWI47R3ODYOQW6LAVCNFSM6AAAAABBW57NSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGA2DSNRSGM . You are receiving this because you were mentioned.Message ID: @.***>

--

Howard Isaacson Research Scientist in Astronomy University of California, Berkeley

awhoward commented 8 months ago

@bjfultn -- I think something is wrong with the CI. It has been run four times on slightly different versions of this PR. The history is fail, pass pass, fail. Last night it pushed one last commit with only trivial changes and it caused the CI to fail. The error message (excerpt below) is related to Postgres. It looks like it's not being installed correctly within Docker. I can't figure out why this work works sometimes and fail others though.

At any rate, I think this particular PR is ready.

Collecting psycopg2-binary
  Downloading psycopg2-binary-2.9.8.tar.gz (383 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
[91m  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ajlqv14g/psycopg2-binary_7bccc984e5dd45f58469f6f88efbcee0/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ajlqv14g/psycopg2-binary_7bccc984e5dd45f58469f6f88efbcee0/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-oga6954h
       cwd: /tmp/pip-install-ajlqv14g/psycopg2-binary_7bccc984e5dd45f58469f6f88efbcee0/
  Complete output (23 lines):
  running egg_info
  creating /tmp/pip-pip-egg-info-oga6954h/psycopg2_binary.egg-info
  writing /tmp/pip-pip-egg-info-oga6954h/psycopg2_binary.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-oga6954h/psycopg2_binary.egg-info/dependency_links.txt
  writing top-level names to /tmp/pip-pip-egg-info-oga6954h/psycopg2_binary.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-oga6954h/psycopg2_binary.egg-info/SOURCES.txt'

  Error: pg_config executable not found.

  pg_config is required to build psycopg2 from source.  Please add the directory
  containing pg_config to the $PATH or specify the full executable path with the
  option:

      python setup.py build_ext --pg-config /path/to/pg_config build ...

  or with the pg_config option in 'setup.cfg'.

  If you prefer to avoid building psycopg2 from source, please install the PyPI
  'psycopg2-binary' package instead.

  For further information please check the 'doc/src/install.rst' file (also at
  <https://www.psycopg.org/docs/install.html>).

  ----------------------------------------
[0m[91mWARNING: Discarding https://files.pythonhosted.org/packages/9d/3d/5ddb908d2e5fdeb8678470d3f654e987356c9f981867313489b063fbe814/psycopg2-binary-2.9.8.tar.gz#sha256=80451e6b6b7c486828d5c7ed50769532bbb04ec3a411f1e833539d5c10eb691c (from https://pypi.org/simple/psycopg2-binary/) (requires-python:>=3.6). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
[0m  Downloading psycopg2-binary-2.9.7.tar.gz (383 kB)
  Preparing metadata (setup.py): started
[91m  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ajlqv14g/psycopg2-binary_6e90784469054b3d9929520488a6bbed/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ajlqv14g/psycopg2-binary_6e90784469054b3d9929520488a6bbed/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-9dc0ggau
       cwd: /tmp/pip-install-ajlqv14g/psycopg2-binary_6e90784469054b3d9929520488a6bbed/
  Complete output (23 lines):
  running egg_info
  creating /tmp/pip-pip-egg-info-9dc0ggau/psycopg2_binary.egg-info
  writing /tmp/pip-pip-egg-info-9dc0ggau/psycopg2_binary.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-9dc0ggau/psycopg2_binary.egg-info/dependency_links.txt
  writing top-level names to /tmp/pip-pip-egg-info-9dc0ggau/psycopg2_binary.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-9dc0ggau/psycopg2_binary.egg-info/SOURCES.txt'

  Error: pg_config executable not found.

  pg_config is required to build psycopg2 from source.  Please add the directory
  containing pg_config to the $PATH or specify the full executable path with the
  option:

      python setup.py build_ext --pg-config /path/to/pg_config build ...

  or with the pg_config option in 'setup.cfg'.

  If you prefer to avoid building psycopg2 from source, please install the PyPI
  'psycopg2-binary' package instead.

  For further information please check the 'doc/src/install.rst' file (also at
  <https://www.psycopg.org/docs/install.html>).

  ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/45/f4/4da1e7f836de4fa3ddb294bb1d4c08daa5cd7b261a6b9a5b50a653a1a29f/psycopg2-binary-2.9.7.tar.gz#sha256=1b918f64a51ffe19cd2e230b3240ba481330ce1d4b7875ae67305bd1d37b041c (from https://pypi.org/simple/psycopg2-binary/) (requires-python:>=3.6). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
[0m  Preparing metadata (setup.py): finished with status 'error'
  Downloading psycopg2-binary-2.9.6.tar.gz (384 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
[91m  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ajlqv14g/psycopg2-binary_220e1dd4bea1494a925528847bef593c/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ajlqv14g/psycopg2-binary_220e1dd4bea1494a925528847bef593c/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-ffh3zfff
       cwd: /tmp/pip-install-ajlqv14g/psycopg2-binary_220e1dd4bea1494a925528847bef593c/
  Complete output (23 lines):
  running egg_info
  creating /tmp/pip-pip-egg-info-ffh3zfff/psycopg2_binary.egg-info
  writing /tmp/pip-pip-egg-info-ffh3zfff/psycopg2_binary.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-ffh3zfff/psycopg2_binary.egg-info/dependency_links.txt
  writing top-level names to /tmp/pip-pip-egg-info-ffh3zfff/psycopg2_binary.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-ffh3zfff/psycopg2_binary.egg-info/SOURCES.txt'

  Error: pg_config executable not found.

  pg_config is required to build psycopg2 from source.  Please add the directory
  containing pg_config to the $PATH or specify the full executable path with the
  option:

      python setup.py build_ext --pg-config /path/to/pg_config build ...

  or with the pg_config option in 'setup.cfg'.

  If you prefer to avoid building psycopg2 from source, please install the PyPI
  'psycopg2-binary' package instead.

  For further information please check the 'doc/src/install.rst' file (also at
  <https://www.psycopg.org/docs/install.html>).

  ----------------------------------------
[0m[91mWARNING: Discarding https://files.pythonhosted.org/packages/98/3e/05ab0922422c91ca0ecb5939a100f8dc2b5d15f5978433beadc87c5329bf/psycopg2-binary-2.9.6.tar.gz#sha256=1f64dcfb8f6e0c014c7f55e51c9759f024f70ea572fbdef123f85318c297947c (from https://pypi.org/simple/psycopg2-binary/) (requires-python:>=3.6). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
[0m  Downloading psycopg2_binary-2.9.5-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
Collecting tqdm

Keck-DataReductionPipelines / KPF-Pipeline

AnalyzeTimeSeries #777

--

Howard Isaacson Research Scientist in Astronomy University of California, Berkeley