Closed nathanjmcdougall closed 6 months ago
Thanks for catching that @nathanjmcdougall, it seems to me that to make this work we'd need to somehow intercept the .log()
calls from within each process, which would require a fair bit of hacking of mlfow. Given it seems like mlflow isn't handling the sqlite backend nicely from multiple threads/processes, I think we should just document it as unsupported for now.
I've added documentation about mlflow backend support in this commit: https://github.com/ben-denham/labtech/commit/16d4656b92bf42c90231727b09a4247c21cc226c
@ben-denham Thanks.
Rather than intercepting the .log()
calls, one possible approach could be to have mflow_start_log
and mlflow_end_log
methods on a Task class. These method calls would be managed by labtech
. Labtech would only call these methods before/after the task has completed, not in the subprocess but in the original process. Labtech could ensure the DB is not locked before doing so.
I don't want to give off the impression that this is anything more than an idea; I think implementing this is hardly worth the effort at this stage in labtech
's development.
Ah, so the user could define mlflow_start_log
and mlflow_end_log
to perform whatever log calls they want? That could work, though would restrict any mid-process logging the user wants to do.
But I agree, let's leave it as is for now and come back to this later if it becomes a bigger issue.
Yes, that's the idea.
On Sun, 19 May 2024, 3:15 pm Dr Ben Denham, @.***> wrote:
Ah, so the user could define mlflow_start_log and mlflow_end_log to perform whatever log calls they want? That could work, though would restrict any mid-process logging the user wants to do.
But I agree, let's leave it as is for now and come back to this later if it becomes a bigger issue.
— Reply to this email directly, view it on GitHub https://github.com/ben-denham/labtech/issues/23#issuecomment-2119082292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEN5SMP2BVCDUWLZOS4QAW3ZDAKOJAVCNFSM6AAAAABHTJKC5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGA4DEMRZGI . You are receiving this because you were mentioned.Message ID: @.***>
When using a SQLite backend for MLFlow, then tasks which finish around the same time sometimes encounter DB locking issues when MLFlow is writing with a
.log
command.Maybe the fix is just to add guidance that a SQLite backend is not supported.
I think the fix might just be have repeated attempts until a maxmimum timeout is reached.
To test this case, you can set the environment variable
MLFLOW_TRACKING_URI=sqllite:\\...
to point to a SQLite file on disk. If you create a large number of very fast-to-run tasks (e.g. returning a constant) then that should be able to reproduce the locking.