CHIMEFRB / datatrail-cli

CHIME/FRB Data Management CLI
https://chimefrb.github.io/datatrail-cli/
MIT License
2 stars 0 forks source link

[BUG] datatrail clear command failing to clear staged baseband.raw data on arc #107

Closed rmck1 closed 1 month ago

rmck1 commented 1 month ago

Describe the bug datatrail clear command regularly fails to clear staged raw baseband data on /arc. The issue was noticed when an inordinate number of failed instances of the baseband pipelines were reporting failures at final unstage_data step of the pipeline. (see e.g. https://frb.chimenet.ca/workflow/web/CHIMEFRB/pipelines/mckinven-baseband-processing/66c78aabc05879f572d2ee8d)

To Reproduce Issue can be reproduced by running the following command in jupyter notebook session on CANFAR (using image: baseband-analysis:lastest)

os.system("datatrail clear chime.event.baseband.raw 214414053 -vvv -f")

The traceback for the above command is the following:

[20:48:43] DEBUG    `clear` called with:                             ]8;id=435533;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=893582;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#69\69]8;;\
           DEBUG    scope: chime.event.baseband.raw [<class 'str'>]  ]8;id=307743;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=36172;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#70\70]8;;\
           DEBUG    dataset: 214414053 [<class 'str'>]               ]8;id=703882;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=492021;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#71\71]8;;\
           DEBUG    directory: None [<class 'NoneType'>]             ]8;id=687894;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=201015;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#72\72]8;;\
           DEBUG    clear_parents: False [<class 'bool'>]            ]8;id=222935;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=298572;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#73\73]8;;\
           DEBUG    verbose: 3 [<class 'int'>]                       ]8;id=125098;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=758414;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#74\74]8;;\
           DEBUG    quiet: False [<class 'bool'>]                    ]8;id=345939;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=268726;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#75\75]8;;\
           DEBUG    Loading configuration.                           ]8;id=464776;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=937050;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#79\79]8;;\
           DEBUG    Site set to: canfar                              ]8;id=419845;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=282839;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#82\82]8;;\
           INFO     No directory, setting to:                        ]8;id=114362;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py\clear.py]8;;\:]8;id=171106;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py#85\85]8;;\
                    /arc/projects/chime_frb/                                    

Searching for files for 214414053 chime.event.baseband.raw...

           DEBUG    Loading configuration.                      ]8;id=818809;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py\functions.py]8;;\:]8;id=906034;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py#388\388]8;;\
           DEBUG    Server: https://frb.chimenet.ca/datatrail   ]8;id=44794;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py\functions.py]8;;\:]8;id=73367;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py#392\392]8;;\
           DEBUG    Configuration loaded successfully.          ]8;id=617588;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py\functions.py]8;;\:]8;id=433847;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py#393\393]8;;\
           INFO     Querying Datatrail for 214414053            ]8;id=239627;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py\functions.py]8;;\:]8;id=459523;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py#398\398]8;;\
                    chime.event.baseband.raw.                                   
           DEBUG    URL:                                        ]8;id=537021;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py\functions.py]8;;\:]8;id=902176;file:///opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py#401\401]8;;\
                    https://frb.chimenet.ca/datatrail/query/dat                 
                    aset/find                                                   
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /opt/pysetup/.venv/bin/datatrail:8 in <module>                               │
│                                                                              │
│   5 from dtcli.cli import cli                                                │
│   6 if __name__ == '__main__':                                               │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])     │
│ ❱ 8 │   sys.exit(cli())                                                      │
│   9                                                                          │
│                                                                              │
│ /opt/pysetup/.venv/lib/python3.8/site-packages/click/core.py:1130 in         │
│ __call__                                                                     │
│                                                                              │
│   1127 │                                                                     │
│   1128 │   def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:       │
│   1129 │   │   """Alias for :meth:`main`."""                                 │
│ ❱ 1130 │   │   return self.main(*args, **kwargs)                             │
│   1131                                                                       │
│   1132                                                                       │
│   1133 class Command(BaseCommand):                                           │
│                                                                              │
│ /opt/pysetup/.venv/lib/python3.8/site-packages/click/core.py:1055 in main    │
│                                                                              │
│   1052 │   │   try:                                                          │
│   1053 │   │   │   try:                                                      │
│   1054 │   │   │   │   with self.make_context(prog_name, args, **extra) as c │
│ ❱ 1055 │   │   │   │   │   rv = self.invoke(ctx)                             │
│   1056 │   │   │   │   │   if not standalone_mode:                           │
│   1057 │   │   │   │   │   │   return rv                                     │
│   1058 │   │   │   │   │   # it's not safe to `ctx.exit(rv)` here!           │
│                                                                              │
│ /opt/pysetup/.venv/lib/python3.8/site-packages/click/core.py:1657 in invoke  │
│                                                                              │
│   1654 │   │   │   │   super().invoke(ctx)                                   │
│   1655 │   │   │   │   sub_ctx = cmd.make_context(cmd_name, args, parent=ctx │
│   1656 │   │   │   │   with sub_ctx:                                         │
│ ❱ 1657 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub │
│   1658 │   │                                                                 │
│   1659 │   │   # In chain mode we create the contexts step by step, but afte │
│   1660 │   │   # base command has been invoked.  Because at that point we do │
│                                                                              │
│ /opt/pysetup/.venv/lib/python3.8/site-packages/click/core.py:1404 in invoke  │
│                                                                              │
│   1401 │   │   │   echo(style(message, fg="red"), err=True)                  │
│   1402 │   │                                                                 │
│   1403 │   │   if self.callback is not None:                                 │
│ ❱ 1404 │   │   │   return ctx.invoke(self.callback, **ctx.params)            │
│   1405 │                                                                     │
│   1406 │   def shell_complete(self, ctx: Context, incomplete: str) -> t.List │
│   1407 │   │   """Return a list of completions for the incomplete value. Loo │
│                                                                              │
│ /opt/pysetup/.venv/lib/python3.8/site-packages/click/core.py:760 in invoke   │
│                                                                              │
│    757 │   │                                                                 │
│    758 │   │   with augment_usage_errors(__self):                            │
│    759 │   │   │   with ctx:                                                 │
│ ❱  760 │   │   │   │   return __callback(*args, **kwargs)                    │
│    761 │                                                                     │
│    762 │   def forward(                                                      │
│    763 │   │   __self, __cmd: "Command", *args: t.Any, **kwargs: t.Any  # no │
│                                                                              │
│ /opt/pysetup/.venv/lib/python3.8/site-packages/click/decorators.py:26 in     │
│ new_func                                                                     │
│                                                                              │
│    23 │   """                                                                │
│    24 │                                                                      │
│    25 │   def new_func(*args, **kwargs):  # type: ignore                     │
│ ❱  26 │   │   return f(get_current_context(), *args, **kwargs)               │
│    27 │                                                                      │
│    28 │   return update_wrapper(t.cast(F, new_func), f)                      │
│    29                                                                        │
│                                                                              │
│ /opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/clear.py:107 in clear   │
│                                                                              │
│   104 │                                                                      │
│   105 │   # Find number of files in common directory and size.               │
│   106 │   console.print(f"\nSearching for files for {dataset} {scope}...\n") │
│ ❱ 107 │   common_path = find_dataset_common_path(scope, dataset, site, verbo │
│   108 │   if common_path:                                                    │
│   109 │   │   common_path = (directory + common_path).replace("//", "/")     │
│   110 │   if not common_path:                                                │
│                                                                              │
│ /opt/pysetup/.venv/lib/python3.8/site-packages/dtcli/src/functions.py:418 in │
│ find_dataset_common_path                                                     │
│                                                                              │
│   415 │   │   file_paths = [f.replace("cadc:CHIMEFRB/", "") for f in file_ur │
│   416 │   │                                                                  │
│   417 │   │   common_path = os.path.commonprefix(file_paths).replace("//", " │
│ ❱ 418 │   │   if common_path[-1] != "/":                                     │
│   419 │   │   │   common_path = "/".join(common_path.split("/")[:-1])        │
│   420 │                                                                      │
│   421 │   else:                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
IndexError: string index out of range

Expected behavior Raw baseband data temporarily staged at /arc/projects/chime_frb/data/baseband/raw/YYYY/MM/DD/astro_<event_id>/ should be cleared once singlebeam.h5 file is produced and yet for event 214414053 and other like it staged data remains.

MWSammons commented 1 month ago

@rmck1 can you copy in the contents of the file at this location ~/.datatrail/config.yaml i.e. in your home directory somewhere

rmck1 commented 1 month ago

Here are the contents of the file: ~/.datatrail/config.yaml

root_mounts:
  canfar: /arc/projects/chime_frb/
  chime: /
  gbo: /
  hco: /
  kko: /
  local: ./
server: https://frb.chimenet.ca/datatrail
site: canfar
vospace_certfile: /arc/home/Mckinven/.ssl/cadcproxy.pem
MWSammons commented 1 month ago

Yeah so the issue is that the post request to find the relevant filepaths returns filepaths with a mix of data/chime/baseband/raw/... and cadc:CHIMEFRB/data/chime/baseband/raw/.. Now normally code exists to account for this by simply finding and replacing all those prefix's.

However in this case, for some reason I don't understand the result gives something like this ...

   'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_725.h5',
   'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_811.h5',
   'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_771.h5',
   'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_773.h5',
   'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_784.h5',
   'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_776.h5',
   'cadc:CHIMEFRB//data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_775.h5',
   'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_7.h5',
   'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_757.h5',
   'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_926.h5',
   'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_939.h5',
   'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_948.h5',
   'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_949.h5',

... which upon close inspection shows a different prefix cadc:CHIMEFRB//data/baseband/raw..., i.e. it has a double forward slash, and so in the find and replace it catches only part of the prefix, leaving the following mix of filepaths remaining ...

 'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_302.h5',
 'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_309.h5',
 'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_311.h5',
 'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_312.h5',
 '/data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_319.h5',
 '/data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_283.h5',
 '/data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_316.h5',
 'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_296.h5',
 'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_297.h5',
 'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_303.h5',
 'data/chime/baseband/raw/2022/03/06/astro_214414053/baseband_214414053_314.h5',

... with forward slashes at the start of some, and as a result there is no common filepath between this set, and the result of that search is common_path='', which then when you try to index common_path[-1] results in the string error. To fix this we can simply search and replace cadc:CHIMEFRB// first before subsequently finding and replacing the original cadc:CHIMEFRB/, but I'm relatively confounded as to why this happens in the first place, and what about these certain events that is different.

tjzegmott commented 1 month ago

I think that in this case, it was due to a change in how we wanted to store the file path at Minoc in the Datatrail Database. With the old way being to include the cadc:CHIMEFRB/ at the begining, but the new way is to have the file path start with data/... and add the storage root.

The fix here is to edit the database with a script to update the files starting with cadc:CHIMEFRB to the new format. Ie. to remove cadc:CHIMEFRB from the file name.

MWSammons commented 1 month ago

No that's not the issue, the existing scripts already find and replace the cadc:CHIMEFRB to nothing, i.e. removing that prefix from the filename and so those differences are handled, the issue is that some are cadc:CHIMEFRB/data...' and some arecadc:CHIMEFRB//data` and so there is a discrepancy even once the prefix has been removed based on whether there are 1 or 2 forward slashes. So I guess the solution is that this edge case also needs to be checked?

tjzegmott commented 1 month ago

Right, I understand but ultimately it comes from the fact that the data in the database isn't consistent. If you don't want to touch the data in the database, you can reorganise the logic that you mention, I think it's these lines. With something like the following:

    if dataset_locations["file_replica_locations"].get("minoc"):  # type: ignore
        file_uris = dataset_locations["file_replica_locations"]["minoc"]  # type: ignore
-       file_paths = [f.replace("cadc:CHIMEFRB/", "") for f in file_uris]
+       file_paths = [f.replace("//", "/").replace("cadc:CHIMEFRB/", "") for f in file_uris]

-        common_path = os.path.commonprefix(file_paths).replace("//", "/")
+        common_path = os.path.commonprefix(file_paths)

Either way should address this issue.

MWSammons commented 1 month ago

Right that makes sense. I guess if this is going to keep happening then logic is better than triaging the database. If it's due to a change in the past then I guess I expect it to be a once off and therefore it's just simpler to change the database, but then I don't know why it would be different for files within a single event?

MWSammons commented 1 month ago

@tjzegmott did the changes you mention already get implemented? Seems like the new code is already in there, if so we can close this issue

tjzegmott commented 1 month ago

I didn't create any commits and I don't see the changes in the code. At least not on the main branch. I will create the commit and PR now.