OSOceanAcoustics / echodataflow

Orchestrated sonar data processing workflow
https://echodataflow.readthedocs.io/en/latest/
MIT License
4 stars 1 forks source link

Grouping Issue with TRANSECT_FILE_REGEX #74

Closed Sohambutala closed 4 months ago

Sohambutala commented 4 months ago

Description

The current implementation of the TRANSECT_FILE_REGEX (r"x(?P\d+)") is causing an unintended behavior where files are grouped into a single group, despite having distinct filenames. This issue affects files like:

x0001_0_wt_20170625_164600_f0003
x0001_1_ot_20170625_175136_f0004
x0001_2_wt_20170625_190753_f0003
x0001_3_ot_20170625_195927_f0004

Expected Behavior

Each file should be grouped separately based on their filenames, ensuring distinct groups for each file.

Actual Behavior

Files are being grouped into a single group despite having distinct filenames.

Environment (please complete the following information):

Steps to Reproduce

Steps to reproduce the behavior:

  1. Add some raw files names into the filenames mentioned above.
  2. Zip all the four files.
  3. Process any flow on the zip file.
  4. Observe that only one group is created.

Screenshots

Possible Solution / Suggestion

Add a flag in Datastore.yaml to change this regex or potentially bypass it to use the filename as a whole for the group name.

Additional Context

leewujung commented 4 months ago

Add a flag in Datastore.yaml to change this regex or potentially bypass it to use the filename as a whole for the group name.

I like the latter solution to bypass this to use the filenames for the grouping, since there are often different ways people want to group their files. The x here stands for transects, but not all datasets have the concept of transects. For example, for mooring type of data like from the OOI, there are no transects, and the grouping can be based on days or weeks.

Sohambutala commented 4 months ago

grouping_regex option has been added to define a regex pattern to derive group names directly from filenames.

In addition: