Closed LukasNickel closed 3 years ago
I'm still learning to work with pytables/h5py, maybe some things can be implemented more efficiently.
split_data
and apply_cuts
should work now, but there is no support for chunkwise reading.
For the apply scripts it should work, but for now I would advise to just avoid it altogether.
Tests are not failing, I will test the results next.
Adressed most comments and fixed some bugs.
ToDo:
read_cta
and read_simple
function respectivelymerge_cta_files
script or accept a list of files for training.n_events
might only contain events of telescope 1 or similar.Apart from the apply_cuts
issue everything should now be functional with cta files.
@LukasNickel Would be nice if we could have it for next week. Can you resolve conflicts? I will make a review.
Adressed most things and also fixed some minor bugs, most notably missing user attributes. ToDo:
datamodel_version
. Its a bit weird because the config objects contain the columns to be read and are not themself associated to an actual file. I tried this locally and had a look a the code again. Some more comments:
the apply cuts script is very slow and inefficient on the CTA data. Especially the set to check for surviving events. Checking each and every obs id / event id in the set is expensive, even when using a set.
apply cuts fails when the output file already exists. It should overwrite the output file.
Loading data is also very slow. So slow It didn't finish for the merged dl1 LST file and I had to kill it. It was stuck inside the pandas merge df with the pointing table. This merge shouldn't even be necessary, since you need to interpolate the pointing, not merge it.
The test is now failing because the dropping of columns before applying the model does not work in the case of the apply dxdy script.
So, this is kinda huge and also not really finished. Try it out, find bugs, whatever. There are some things, that we could consider refactoring, like putting some code from the apply scripts into functions in the IO part (writing predictions for example). The IO code is somewhat messy, because you need to perform some merges on different dataframes to get all of the information and chunkwise reading is a bit tricky.
An overview of the status:
/dl2
group. I have not checked the resulting predictions!data_format
(optionssimple
andCTA
) replaces most of theevent_key, telescope_event_key, has_muliple_telescopes, coordinate_transformation
stuff. If there is a use case for e.g. flat DataFrames with CTA data, please report and we can get thecordinate_transformation
key back in.runs
,array_events, and
telescope_events` table is removed. Only flat dataframes or CTA dl1 files. Units tests and examples are adapted accordingly.