NDCMS / lobster

A userspace workflow management tool for harnessing non-dedicated resources for high-throughput workloads.
MIT License
3 stars 14 forks source link

Lobster Tries to Stream Gridpacks with XRootD #629

Open klannon opened 6 years ago

klannon commented 6 years ago

As encountered by @Andrew42, when running with MultiProductionDataset Lobster blithely decides that it should stream gridpack files, even though CMSSW doesn't know how to do that. This leads to the gridpack file being passed into the config as root://deepthrought.crc.nd.edu://.... A workaround is to disable streaming, but if we are doing multistage production (i.e. GEN-SIM+DIGI-RECO+MiniAOD) that will mean none of the steps can stream inputs, since disable_input_streaming is a global parameter of StorageConfiguration. It would be nice to have finer grained control over XRootD streaming so that we could stream some input files but not others.

I can think of two options for accomplishing this:

Although I like the thought of being more flexible, I'm leaning towards the "Quick and Dirty" solution. I suppose another response would be that nothing's broken so don't fix it. It's a feature; not a bug. Input welcome, especially from @annawoodard and @matz-e!

annawoodard commented 6 years ago

One alternative quick and dirty approach that would be more flexible than pattern matching but similarly simple would be to make disable_input_streaming a property of the Workflow (passed as an argument in the constructor) instead of the StorageConfiguration (so do not completely re-engineer everything, just that one property). Then instead of setting parameters['disable streaming'] here you would set it in Workflow.adjust here. Note that if you go that route, it would probably make sense to also make disable_stage_in_acceleration a property of the workflow.

I think that would completely solve this specific problem. So the next question would be: what are the other use cases of the bigger re-engineering approach, and are they worth the development effort?

klannon commented 6 years ago

@annawoodard: I like that suggestion. That's what I'll plan to do, unless I run into a problem when I start working out the implementation. Regarding the more expansive solution, no one is asking for this. The only use case I can dream up is one where, in a single Lobster project, you'd be coordinating a multistage/multisite production where, for example, you want to store GEN-SIM at ND, DIGI-RECO at the LPC CAF, and mini-AOD/nano-AOD at UVa, or something crazy like that. I think we can safely defer any idea of doing that until someone actually asks whether such a thing would be feasible.