broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
997 stars 361 forks source link

Store call status permanently ? #2338

Closed Horneth closed 7 years ago

Horneth commented 7 years ago

Currently the only trace of a call end status after the workflow has completed is in the metadata. This makes it impossible for other endpoints (e.g: call caching diff endpoint) to relay this information without reading from metadata. We might want to investigate storing this information somewhere permanent: it is currently available in the jobstore but only for the duration of the workflow.

geoffjentry commented 7 years ago

I think the issue is actually reversed. The call cache diff endpoint should be accessing metadata which means the missing info are the call cache hashes should be in the metadata store.

We say that the metadata repository is the collection of every meaningful event that has occurred in the system and that allows downstream clients to shape that information to suit their needs. That's why all user facing "gie me information about XYZ" endpoints read from there. This should be the same I think.

cjllanwarne commented 7 years ago

Interesting point!

I like the idea philosophy of CC tables being engine only, and queries being completely and solely calculable from metadata. It would probably mean piping forwarding all CC hashes, toggles of "allowResultReuse", failures to copy results, etc to the metadata.

geoffjentry commented 7 years ago

Yeah it'd be annoying but we did claim we were going w/ the cqrs model

mcovarr commented 7 years ago

Call cache diffing did end up being written against metadata so what this ticket is asking for is no longer needed.