DUNE / data-mgmt-ops

3 stars 2 forks source link

request for rule to move hist files to persistent #666

Open hschellman opened 3 days ago

hschellman commented 3 days ago

Once we get the root-tuple-virtual, I can provide a query that will ask for the files to go to fnal for merging. I will update this request when that is available.

hschellman commented 2 days ago

Ok, here is the metacat query. I don't know how this translates into a rucio rule. Can you maybe describe it a bit when you respond to this ticket? Thanks, Heidi

I can make a dataset if that helps. Not certain how one updates it though as files come in.

metacat query "files where created_timestamp > 2024-07-01 and core.data_tier=root-tuple-virtual and core.file_type=detector and core.run_type=hd-protodune"

hschellman commented 2 days ago

Actually - let's be more specific

metacat query "files where created_timestamp > 2024-07-01 and core.data_tier=root-tuple-virtual and core.file_type=detector and core.run_type=hd-protodune and dune.campaign=hd-protodune-reco-keepup-v0"

dougbenjamin commented 2 days ago

Hi Heidi,

It is best to move datasets. A metacat query like the one you showed now means I have to go through all the files, identify their parent datasets and then create the rules associated with the parent datasets. I will look at it in the morning.

Regards, Doug

From: Heidi Schellman @.> Date: Tuesday, July 2, 2024 at 9:10 PM To: DUNE/data-mgmt-ops @.> Cc: Doug Benjamin @.>, Assign @.> Subject: Re: [DUNE/data-mgmt-ops] request for rule to move hist files to persistent (Issue #666)

Actually - let's be more specific

metacat query "files where created_timestamp > 2024-07-01 and core.data_tier=root-tuple-virtual and core.file_type=detector and core.run_type=hd-protodune and dune.campaign=hd-protodune-reco-keepup-v0"

— Reply to this email directly, view it on GitHubhttps://github.com/DUNE/data-mgmt-ops/issues/666#issuecomment-2204906451, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABA5LK7MQ6BE7D6MOUFK5QLZKNMQVAVCNFSM6AAAAABKHYPX62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBUHEYDMNBVGE. You are receiving this because you were assigned.Message ID: @.***>

hschellman commented 2 days ago

I can make a metacat dataset - one question is how we grow them as new files come in?

and how would I convey that information to you? We don’t want all the output from those jobs, just the hist files and I think they are likely comingled at the moment.

So our first request should be that files that need to be merged need to go into their own dataset - datasets need to be granular at the data_tier level to avoid comingling stuff.

On Jul 2, 2024, at 8:40 PM, Doug Benjamin @.***> wrote:

[This email originated from outside of OSU. Use caution with links and attachments.]

Hi Heidi,

It is best to move datasets. A metacat query like the one you showed now means I have to go through all the files, identify their parent datasets and then create the rules associated with the parent datasets. I will look at it in the morning.

Regards, Doug

From: Heidi Schellman @.> Date: Tuesday, July 2, 2024 at 9:10 PM To: DUNE/data-mgmt-ops @.> Cc: Doug Benjamin @.>, Assign @.> Subject: Re: [DUNE/data-mgmt-ops] request for rule to move hist files to persistent (Issue #666)

Actually - let's be more specific

metacat query "files where created_timestamp > 2024-07-01 and core.data_tier=root-tuple-virtual and core.file_type=detector and core.run_type=hd-protodune and dune.campaign=hd-protodune-reco-keepup-v0"

— Reply to this email directly, view it on GitHubhttps://github.com/DUNE/data-mgmt-ops/issues/666#issuecomment-2204906451, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABA5LK7MQ6BE7D6MOUFK5QLZKNMQVAVCNFSM6AAAAABKHYPX62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBUHEYDMNBVGE. You are receiving this because you were assigned.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/DUNE/data-mgmt-ops/issues/666#issuecomment-2205015042, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIA37DJVJTDKJCIA6SIHA73ZKNXCJAVCNFSM6AAAAABKHYPX62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBVGAYTKMBUGI. You are receiving this because you authored the thread.Message ID: @.***>

hschellman commented 1 day ago

I have found 2 datasets in metacat which seem to correspond to rucio scopes.

hd-protodune-det-reco:calcuttj_keepup_cal_cal_062024_ntuple_2382 hd-protodune-det-reco:calcuttj_keepup_cal_cal_062024_ntuple_2381

It would be good to get metadata added to datasets so these were easier to find. I found these by brute force checking all of the files that matched the query to make certain I missed none. All one needs to do is add a tag to the dataset metadata that says thing like - hey mergeme!

These are still root-tuple - no non-trivial files created since Jake changed to virtual data_tier.

StevenCTimm commented 3 hours ago

The right way to do this would have been to query Justin on the produced workflows and that would have told you what the dataset names were in metacat and rucio.