dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

MS: implement MongoDB clean up for the MSOutput service #9611

Open amaltaro opened 4 years ago

amaltaro commented 4 years ago

Impact of the new feature ReqMgr2MS, MongoDB

Is your feature request related to a problem? Please describe. This is part of the output data placement migration to WMCore, see: https://github.com/dmwm/WMCore/wiki/ReqMgr2-MicroService-Output

Describe the solution you'd like Because we cannot only add documents and never clean them up...

Implement a database cleanup logic. First thing that comes to my mind is, deletion of any data belonging to archived workflows.

Describe alternatives you've considered Or perhaps allow a grace period for archived workflows, and only then delete it.

Additional context none

vkuznet commented 4 years ago

For clean-up you should use MongoDB delete API, see https://docs.mongodb.com/manual/tutorial/remove-documents/

The delete API can remove docs which correspond to provided query.

amaltaro commented 3 years ago

Given that document granularity is at the workflow level (1 for each workflow going through this microservice), I guess there won't be any need to perform any clean up in the many years to come. I'm in favor of closing this issue. @todor-ivanov what are your thoughts?

todor-ivanov commented 3 years ago

Hi @amaltaro, this sentence I believe gives the answer:

Implement a database cleanup logic. First thing that comes to my mind is, deletion of any data belonging to archived workflows.

Which is basically clearly describing an interface between MSRuleCleaner and MSOutput. This indeed is a good candidate for creating a separate issue under the realm of the MSRuleCleaner/MSArchival service.

amaltaro commented 3 years ago

I think this would make our MSOutput records completely useless, since they would live in the database only for a couple of days. I'm in favor of not cleaning anything before Run3 is over (unless we get ourselves into storage problems, which I doubt it will be the case).

todor-ivanov commented 3 years ago

That is a valid point. I agree with you. Lets see how fast the database grows.