artefactual-sdps / preprocessing-base

Enduro preprocessing child workflow base repository
1 stars 0 forks source link

Create a BagIt bag at the end of preprocessing #4

Closed djjuhasz closed 1 month ago

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 78.57143% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 34.92%. Comparing base (08f010d) to head (32226b6).

Files Patch % Lines
cmd/worker/workercmd/cmd.go 0.00% 4 Missing :warning:
internal/workflow/preprocessing.go 89.47% 1 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #4 +/- ## ========================================== + Coverage 29.94% 34.92% +4.98% ========================================== Files 5 5 Lines 167 189 +22 ========================================== + Hits 50 66 +16 - Misses 115 120 +5 - Partials 2 3 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

djjuhasz commented 1 month ago

@jraddaoui I'm not sure if we want to add the "CreateBagActivity" here, or just add it to preprocessing-sfa and preprocessing-moma?

P.S. I haven't tested this in dev yet, I'll do that tomorrow.

jraddaoui commented 1 month ago

Thanks @djjuhasz! I kind of like having it here as an example, even if not all child workflows will need it. It also gives us the chance to improve the readme and the requirements to make it work with Enduro, I'll follow-up with that once this is merged.

djjuhasz commented 1 month ago

I tested the preprocessing workflow, and bagging works... kind of. :(

The first time a transfer is submitted, a bag is created in place. If the same directory is submitted again, then bagging fails with an error:

CreateBagActivity: create bag: mkdir /home/preprocessing/shared/small/data: file exists

This error will occur any time a transfer with the same "relative path" (e.g. "small") is submitted because we are replacing the source transfer contents with a BagIt bag at the same path. When the same transfer path is submitted a second time, we try to re-bag the bag created by the first run. :(

I think we are going to need to create a unique working directory for each workflow instance to prevent errors on duplicate submissions and to prevent failed preprocessing workflows from leaving the source transfer in a partially transformed state.

djjuhasz commented 1 month ago

The first time a transfer is submitted, a bag is created in place. If the same directory is submitted again, then bagging fails with an error:

CreateBagActivity: create bag: mkdir /home/preprocessing/shared/small/data: file exists

The above error doesn't occur when preprocessing is run as a child workflow of the Enduro processing workflow - Enduro creates a unique local transfer directory on each workflow run.

jraddaoui commented 1 month ago

The first time a transfer is submitted, a bag is created in place. If the same directory is submitted again, then bagging fails with an error:

CreateBagActivity: create bag: mkdir /home/preprocessing/shared/small/data: file exists

The above error doesn't occur when preprocessing is run as a child workflow of the Enduro processing workflow - Enduro creates a unique local transfer directory on each workflow run.

Are you using the Tilt UI submit button? Should we just delete before copying here?

djjuhasz commented 1 month ago

Are you using the Tilt UI submit button? Should we just delete before copying here?

@jraddaoui yes, I was using the Tilt UI submit button (x2) which caused the error. Deleting any existing transfers before copying in the submitted transfer sounds good to me.