artefactual-sdps / enduro

A tool to support ingest and automation in digital preservation workflows
https://enduro.readthedocs.io/
Apache License 2.0
4 stars 3 forks source link

Implement CreatePreservationTasks with CreateBulk #1048

Open sevein opened 1 month ago

sevein commented 1 month ago

This PR explores a new pattern for handling bulk database operations using Go 1.23's range-over-function feature. It introduces a new CreatePreservationTasks method in the persistence package, which accepts a generator of preservation tasks (iter.Seq[*datatype.PreservationTask]). The method utilizes the batch function to group tasks into manageable batches and leverages CreateBulk to perform efficient bulk inserts in a single INSERT operation.

To view the main idea of this proposal in isolation, visit: https://go.dev/play/p/y7tYvD7NO28.

In future PRs, this new function could be used in areas like CreateAIPActivity and JobTracker to improve bulk task creation.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 75.64103% with 19 lines in your changes missing coverage. Please review.

Project coverage is 54.64%. Comparing base (c0afd76) to head (25f1c7e).

Files with missing lines Patch % Lines
internal/persistence/telemetry.go 0.00% 10 Missing :warning:
...ternal/persistence/ent/client/preservation_task.go 88.88% 3 Missing and 3 partials :warning:
internal/persistence/ent/client/batch.go 78.57% 2 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1048 +/- ## ========================================== - Coverage 54.68% 54.64% -0.05% ========================================== Files 104 105 +1 Lines 7631 7683 +52 ========================================== + Hits 4173 4198 +25 - Misses 3201 3221 +20 - Partials 257 264 +7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

jraddaoui commented 3 weeks ago

Sorry for the late update @sevein. I took a quick look a while ago, really nice and good learnings! I think @djjuhasz is also looking at it and will do the actual code review.

djjuhasz commented 3 weeks ago

@djjuhasz yes, I've been looking at it and will provide feedback. I'm having a hard time wrapping my head around the iterator code flow.

sevein commented 3 weeks ago

I've been looking at it and will provide feedback. I'm having a hard time wrapping my head around the iterator code flow.

The syntax is a bit confusing :cry:... I found the docstring of the compiler's rangefunc package somewhat useful: https://github.com/golang/go/blob/master/src/cmd/compile/internal/rangefunc/rewrite.go.

There are a couple of areas where I'm unsure if I'm doing things correctly:

I haven't found yet an implementation like this in other codebases, I should probably look around more. I assume other poeple are also trying to solve the problem of ublk inserts with iterators? A project that is slightly related is https://github.com/achille-roussel/sqlrange.