askap-vast / vast-pipeline

This repository holds the code of the Radio Transient detection pipeline for the VAST project.
https://vast-survey.org/vast-pipeline/
MIT License
7 stars 3 forks source link

Dask/Pandas regression: IndexError raised during association #640

Closed ajstewart closed 2 years ago

ajstewart commented 2 years ago

Same issue as #631 however it is appearing locally for me this time on the parallel association. Setup:

I found it by local tests failing, e.g.

======================================================================
ERROR: setUpClass (vast_pipeline.tests.test_regression.test_epoch_parallel_add_image.BasicEpochParallelAddTwoImageTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/adam/GitHub/vast-pipeline/vast_pipeline/management/commands/runpipeline.py", line 340, in run_pipe
    pipeline.process_pipeline(p_run)
  File "/Users/adam/GitHub/vast-pipeline/vast_pipeline/pipeline/main.py", line 195, in process_pipeline
    sources_df = parallel_association(
  File "/Users/adam/GitHub/vast-pipeline/vast_pipeline/pipeline/association.py", line 1583, in parallel_association
    dd.from_pandas(images_df, n_cpu)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/base.py", line 290, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/base.py", line 573, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/multiprocessing.py", line 220, in get
    result = get_async(
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/local.py", line 506, in get_async
    raise_exception(exc, tb)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/local.py", line 314, in reraise
    raise exc
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/local.py", line 219, in execute_task
    result = _execute_task(task, data)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/optimization.py", line 969, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/utils.py", line 39, in apply
    return func(*args, **kwargs)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/dataframe/core.py", line 6259, in apply_and_enforce
    df = func(*args, **kwargs)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/dask/dataframe/groupby.py", line 170, in _groupby_slice_apply
    return g.apply(func, *args, **kwargs)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1414, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1455, in _python_apply_general
    values, mutated = self.grouper.apply(f, data, self.axis)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 776, in apply
    f(data.iloc[:0])
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1388, in f
    return func(g, *args, **kwargs)
  File "/Users/adam/GitHub/vast-pipeline/vast_pipeline/pipeline/association.py", line 1114, in association
    skyreg_group = images_df['skyreg_group'].iloc[0]
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 967, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 1520, in _getitem_axis
    self._validate_integer(key, axis)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 1452, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/django/test/testcases.py", line 1201, in setUpClass
    cls.setUpTestData()
  File "/Users/adam/GitHub/vast-pipeline/vast_pipeline/tests/test_regression/test_epoch_parallel_add_image.py", line 353, in setUpTestData
    call_command('runpipeline', self.compare_run)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/django/core/management/__init__.py", line 181, in call_command
    return command.execute(*args, **defaults)
  File "/Users/adam/anaconda3/envs/vast-pipeline-dev-3.8/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/Users/adam/GitHub/vast-pipeline/vast_pipeline/management/commands/runpipeline.py", line 442, in handle
    _ = run_pipe(
  File "/Users/adam/GitHub/vast-pipeline/vast_pipeline/management/commands/runpipeline.py", line 349, in run_pipe
    raise CommandError(f'Processing error:\n{e}')
django.core.management.base.CommandError: Processing error:
single positional indexer is out-of-bounds