CHIMEFRB / datatrail-cli

CHIME/FRB Data Management CLI
https://chimefrb.github.io/datatrail-cli/
MIT License
2 stars 0 forks source link

[FEATURE] datatrail command to show all event numbers registered with a scope #50

Closed kaitshin closed 10 months ago

kaitshin commented 11 months ago

Is your feature request related to a problem? Please describe. Currently to find which events have baseband data on minoc, I have to run a manual script that goes through a for-loop and runs a function based on `datatrail ps' to see if there is data for that event. It'd be great if there were a single command, or even an accessible database, that has that info -- just for event number IDs.

Describe the solution you'd like Something like datatrail ls chime.event.baseband.raw --event-ids to print out a list of event IDs to the terminal, or print it to a file. (Should ideally work with other scopes like kko.event.baseband.raw, gbo.event.baseband.raw, chime.event.intensity.raw, etc...

Describe alternatives you've considered N/A

Additional context Given how many people I've shared my hacky little script with, I'm sure multiple people would really benefit from this feature!

zpleunis commented 11 months ago

This would be really helpful for making the repeater weather reports as well! I am currently querying datatrail ps for every event ID that is associated with a repeater every week and for which I have not yet recorded baseband data is saved to disk.

jacobpwillis commented 11 months ago

This would be helpful with vlbi analysis and make it much more convenient to find events while GBO baseband data is being transferred to minoc.

MWSammons commented 11 months ago

@kaitshin @zpleunis @jacobpwillis. I'm not sure it's necessarily as broad as what Kaitlyn is describing but datatrail-cli can be used to find all datasets associated with a given larger data set. As detailed incompletely (there's a missing command) here we can use datatrail to move down the datatrail tree.

datatrail ls

prints out all the possible scopes, choosing a scope, e.g. chime.event.baseband.raw

datatrail ls chime.event.baseband.raw

will print out all the larger datasets associated with a given scope, choosing a larger dataset, e.g. classified.FRB

datatrail ls chime.event.baseband.raw classified.FRB

will print out all the datasets associated with a given larger dataset and scope, in this case I believe it returns the event_IDs of all classified FRBs with baseband data captured at chime. To write the resulting list to a file we can use

datatrail ls chime.event.baseband.raw classified.FRB --write

which will generate an auto-named csv file in the current working directory with the event_IDs inside (they do have " " around them which is a little annoying but could be easily stripped out. What isn't clear is what some of the larger datasets you'll see in the second step actually are / contain. Naively I imagine an FRB can belong to multiple larger datasets and so there will be some general larger datasets, and some specific ones, e.g. all datasets associated with a repeating FRB might be found both within a larger dataset dedicated to that FRB, and within one dedicated to all FRBs in general. However, I'm not 100% about how this functions, perhaps @tjzegmott can say more.

zpleunis commented 10 months ago

Thanks @MWSammons et al., this is great and much quicker than my previous implementation!

I implemented it through the Python interface, without saving the file to disk in between.

This checks whether an array of event IDs (eventids, as ints) exists in datatrail:


from dtcli.src import functions

scope = "chime.event.baseband.raw"
datasets = "classified.FRB"

bb_datatrail = np.array(functions.list(
    "chime.event.baseband.raw", "classified.FRB")["datasets"]).astype(int)

bb_eventids = np.intersect1d(eventids, bb_datatrail)