Open janosh opened 1 month ago
Hi @janosh, indeed there is no way of setting the metadata in the flow document, thanks for reporting that.
I am not 100% convinced by the solution though. I can see a few potential minor issues:
1) the fact that jobflow's Flow
object does not have a metadata
could make this a bit hackish
2) There may be a confusion between this metadata
attribute and the update_metadata
method in Flow. One calling this method on a Flow may be tricked in thinking that also the Flow metadata is set.
3) In principle a user could run a code like this:
flow1 = Maker1().make()
flow1.metadata = {"a": 1}
flow2 = Maker2().make(flow1.output)
flow = Flow([flow1, flow2])
submit_flow(flow)
and no Flow metadata will be added to the DB. Since only the top layer Flow is added in the DB it would be ambiguous what to do with those metadata.
4) If jobflow introduces a metadata
attribute to Flow
in the future this may break something.
Adding the metadata attribute to Flow
now could help, although will leave point 3 open and point 2 could still be tricky, but at least the behaviour of update_metadata
could be documented with respect to the Flow
's metadata
.
An alternative solution could be to to pass a metadata
(or flow_metadata
, to be more explicit) argument to submit_flow
. This would solve the points above, but would probably feel a bit more clunky.
What do you think?
Thanks @janosh for opening this issue. Indeed, as @gpetretto mentioned, it is currently not possible. Concerning point 1., I propose to include @utf in the discussion. Maybe there is a need (or at least a wish) to have flow metadata. Not sure how (and how easy it would be) this could be added to jobflow itself in the first place and "passed down" to jobflow-remote. The tricky point is that in jobflow, the Flow exists at definition time but not anymore at execution (nor in the database). I think this was done in order to avoid duplication of outputs of jobs in outputs of flows (if they existed). If there is a strong push towards that, maybe we could have a call altogether to discuss options ?
Thanks @janosh for opening this issue. Indeed, as @gpetretto mentioned, it is currently not possible. Concerning point 1., I propose to include @utf in the discussion. Maybe there is a need (or at least a wish) to have flow metadata. Not sure how (and how easy it would be) this could be added to jobflow itself in the first place and "passed down" to jobflow-remote. The tricky point is that in jobflow, the Flow exists at definition time but not anymore at execution (nor in the database). I think this was done in order to avoid duplication of outputs of jobs in outputs of flows (if they existed). If there is a strong push towards that, maybe we could have a call altogether to discuss options ?
I am not sure that it would be particularly tricky to handle this in jobflow and jobflow-remote. Adding the metadata
attribute to Flow
should not pose particular problems, except that it should be clarified the behaviour of update_metadata
. The fact that the Flow stops existing after the Flow is stored in the jobflow-remote DB is not really a problem, since that would be enough to add the metadata to the DB in the way suggested by @janosh.
I assumed that @janosh's requests was only to ease the query of the Flows in jobflow-remote's DB, not to add those metadata to the outputs. Is this correct?
thanks for the quick replies!
@gpetretto 1 - 4 are excellent points and should be handled intuitively and without pitfalls. i should have formulated my issue more like an RFC (which is what this is now anyway 😄).
i tried the update_metadata
first and was mostly expecting that to be reflected in the submitted flow
documents in the database. my hacky solution was step 2 after that didn't work
I assumed that @janosh's requests was only to ease the query of the Flows in jobflow-remote's DB, not to add those metadata to the outputs. Is this correct?
that's correct. though in principle, i think both are useful. but adding metadata to the output seems like a pure jobflow
feature and not something jf-remote
needs to worry about
maybe we could have a call altogether to discuss options
@davidwaroquiers i could imagine @utf would prefer to discuss on GitHub but happy to do call and to flex to your schedules if i'm mistaken!
There currently appears to be no way of adding metadata like material IDs, formulas, structure provenance and the like to documents added into the
flows
collection.i think it would make sense to e.g. look for a metadata attribute on a
jobflow.Flow
and if found, add that to theflow_doc
prior to DB insertion inJobController.add_flow
:https://github.com/Matgenix/jobflow-remote/blob/967e7c512f230105b1a82c2227fb101d8d4acb3d/src/jobflow_remote/jobs/jobcontroller.py#L2591
happy to submit a PR for this if there's interest