XENONnT / straxen

Streaming analysis for XENON
BSD 3-Clause "New" or "Revised" License
20 stars 32 forks source link

Give `RunDB` an option to find files in storage #1244

Closed dachengx closed 1 year ago

dachengx commented 1 year ago

Give RunDB an option to find files in storage but not in database in find_several function.

What does the code in this PR do / what does it improve?

Historically, the RunDB knows where are all the available files located on servers and assumes they are all available. But if we have multiple partitions, then RunDB will provide runs in the database but not available.

This happens a lot when running strax.Context.select_runs, in the strax.StorageFrontend.find_several function. So this PR gives RunDB an option to run strax.StorageFrontend.find_several directly.

Can you briefly describe how it works?

Can you give a minimal working example (or illustrate with a figure)?

Please include the following if applicable:

Notes on testing

All italic comments can be removed from this template.

coveralls commented 1 year ago

Coverage Status

coverage: 93.548% (+0.05%) from 93.502% when pulling 69df71e9f14a7434da0343f5d4903d0172ce46a0 on check_metadata_rundb into 3258746206f995d49fe28db3c4d82bc65160c092 on master.

JYangQi00 commented 1 year ago

Thanks Dacheng, I tried to use st.select_runs with _find_in_storage = False and with _find_in_storage = True, however, I'm having a bit of trouble understanding the output. I'd like to find some runs that cannot be loaded when found using st.select_runs with _find_in_storage=False. Then I would expect that when _find_in_storage=True, these runs do not show up when st.select_runs is called. Is my interpretation of this fix correct?

However, instead I find that when using _find_in_storage=False, I do not get any extra runs compared to when _find_in_storage=True. I also specified that peak basics should be available when using select_runs. Is there an example of a run+data_type that cannot be loaded despite the output of select_runs saying that it can? I knew there were some in the past but I haven't found any this time around.

dachengx commented 1 year ago

However, instead I find that when using _find_in_storage=False, I do not get any extra runs compared to when _find_in_storage=True. I also specified that peak basics should be available when using select_runs. Is there an example of a run+data_type that cannot be loaded despite the output of select_runs saying that it can? I knew there were some in the past but I haven't found any this time around.

Like the 050034-peaklets-ui5hguaz2k of offline context, but you need to be on a midway computing node like jupyter notebook.