ben-denham / labtech

Easily run experiment permutations with multi-processing and caching
https://ben-denham.github.io/labtech/
GNU General Public License v3.0
7 stars 1 forks source link

MLFlow integration for SQLite backend #23

Closed nathanjmcdougall closed 4 months ago

nathanjmcdougall commented 4 months ago

When using a SQLite backend for MLFlow, then tasks which finish around the same time sometimes encounter DB locking issues when MLFlow is writing with a .log command.

Maybe the fix is just to add guidance that a SQLite backend is not supported.

I think the fix might just be have repeated attempts until a maxmimum timeout is reached.

To test this case, you can set the environment variable MLFLOW_TRACKING_URI=sqllite:\\... to point to a SQLite file on disk. If you create a large number of very fast-to-run tasks (e.g. returning a constant) then that should be able to reproduce the locking.

ben-denham commented 4 months ago

Thanks for catching that @nathanjmcdougall, it seems to me that to make this work we'd need to somehow intercept the .log() calls from within each process, which would require a fair bit of hacking of mlfow. Given it seems like mlflow isn't handling the sqlite backend nicely from multiple threads/processes, I think we should just document it as unsupported for now.

I've added documentation about mlflow backend support in this commit: https://github.com/ben-denham/labtech/commit/16d4656b92bf42c90231727b09a4247c21cc226c

nathanjmcdougall commented 4 months ago

@ben-denham Thanks.

Rather than intercepting the .log() calls, one possible approach could be to have mflow_start_log and mlflow_end_log methods on a Task class. These method calls would be managed by labtech. Labtech would only call these methods before/after the task has completed, not in the subprocess but in the original process. Labtech could ensure the DB is not locked before doing so.

I don't want to give off the impression that this is anything more than an idea; I think implementing this is hardly worth the effort at this stage in labtech's development.

ben-denham commented 4 months ago

Ah, so the user could define mlflow_start_log and mlflow_end_log to perform whatever log calls they want? That could work, though would restrict any mid-process logging the user wants to do.

But I agree, let's leave it as is for now and come back to this later if it becomes a bigger issue.

nathanjmcdougall commented 4 months ago

Yes, that's the idea.

On Sun, 19 May 2024, 3:15 pm Dr Ben Denham, @.***> wrote:

Ah, so the user could define mlflow_start_log and mlflow_end_log to perform whatever log calls they want? That could work, though would restrict any mid-process logging the user wants to do.

But I agree, let's leave it as is for now and come back to this later if it becomes a bigger issue.

— Reply to this email directly, view it on GitHub https://github.com/ben-denham/labtech/issues/23#issuecomment-2119082292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN5SMP2BVCDUWLZOS4QAW3ZDAKOJAVCNFSM6AAAAABHTJKC5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGA4DEMRZGI . You are receiving this because you were mentioned.Message ID: @.***>