Closed fschlueter closed 5 months ago
If you feel this is an important feature (as opposed to the consumer looping over datasets), then it can be implemented straightforwardly. Do you have a suggestion for a preferred interface for this? One option would be to define a MultiDataset( ) that accepts either tuples of run/event numbers or directories to look in, or should we further overload Dataset?
The intention eventually is to have a stable API surface here.
I think it would be an important feature! If we just use mattak in one or two NuRadioMC modules we could loop there over datasets, however, if we want to enable any user to quickly process some data via mattak we should make it as convenient as possible.
What comes to my mind right now would be the following: You specify a directory and Mattak parses through all subdirectories and gets all events in those. Additional what would be nice to have is to filter for a specific station id (and maybe even specify a range of runs although that might be unnecessary).
And I actually would overload Dataset if possible.
Another idea: we could allow to directly pass a list of file names or/and specify which file to take in a given directory.
The problem with this is that full datasets require multiples files passed per run (since generating a combined file for full datasets would take twice the space for no reason).
On 2/27/23 11:30, Felix Schlüter wrote:
Another idea: we could allow to directly pass a list of file names or/and specify which file to take in a given directory.
— Reply to this email directly, view it on GitHub https://github.com/RNO-G/mattak/issues/7#issuecomment-1446747817, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGHWKJA4SN5ZENG3SXSSPDWZTQDZANCNFSM6AAAAAAVGAYTQU. You are receiving this because you were mentioned.Message ID: @.***>
I see, but we have these combined files which are "standalone", correct? And I guess that would also allow to read files like the one created by zack. But I guess if it is to inversive it is not worth it.
So one thing that can clean this up which I've already implemented but not merged is that if you pass a file instead of a directory to data_dir, it could treat it as a combined.root file (since it's easy to check if a path is a file or a directory). Would that help?
I think if a .root file is passed explicitly it is fine to treat it as combined.root
... if they are not combined one might anyway want to store individual files in a directory (or have access to a cluster where the data is stored in the usual directory structure)
I would go for that option
Alright, let me see if I can brew something together
This branch attempted to do this, among a few other things : https://github.com/rno-g/mattak/tree/add-dataset-opts
It allows treating directory as a file and also setting a preferred file name (like if you has a Felix.root in each directory you wanted to load as a combined fire). Looks like will require remerging. I'm going to be on a plane most of the day today so probably won't get to it today.
On April 28, 2023 7:16:13 AM CDT, "Felix Schlüter" @.***> wrote:
Alright, let me see if I can brew something together
-- Reply to this email directly or view it on GitHub: https://github.com/RNO-G/mattak/issues/7#issuecomment-1527479152 You are receiving this because you were mentioned.
Message ID: @.***>
Alright, should I have a look than?
PS: I why not every station should write their own files :)
As a reminder for us. Once the interface is improved to take file paths we can remove the dirty hack in NuRadioReco implemented with this PR https://github.com/nu-radio/NuRadioMC/pull/567
This issue can probably be closed now after #25 was merged. I will reopen it if it turns out we want an even more flexible interface
Hi @cozzyd,
the title of the issue is more my feeling/opinion. The interface should reflect what "use cases" we will have and they probably change/extend when mattak is integrated into NuRadioMC. So, some brainstorming might be useful. What could be the use cases and how do we want to serve those.
Let me summarise what I think the current status is (so you can correct me if necessary) and make some suggestion what to extend.
Right now you can specify a station (id), run (id) and directory (optional). If station and run are not zero the directory is considered a root directory and the files with in directory/station/run are read-in. If station and run zero, mattak looks for
waveforms.root
orcombined.root
(in this order) in the specified directory (not in any subdirectory).So, what is missing is the possibility to, at once, read in several stations/runs. Do you think that can be easily implemented in mattak or should that be implemented in the NuRadioMC module?