Closed yacremann closed 1 year ago
I'm not 100% sure if that works, but it appears to me that you can select the index via the config parameter dict "channels"
self.all_channels: dict = self._config.get("channels", {})
It should be straight forward to create two instances with different configs
@yacremann I believe you would want the old ddMicrobunches table, right? If I am not mistaken, this can be obtained from the current dataframe by droppping the electron index, and leaving only the per-pulse channels, as @rettigl correctly suggests I can look into making a simpler access to generating this, as is it indeed useful to normalize FEL intensity, but also delay stage positions (less crucial with the current laser system).
I was also thinking about this in the context of lab experiments. In principle, such a normalization histogram could be derived from a per-electron dataframe, if every electron has a timestamp, which could be generated for the FLASH data, and is already implemented for our data.
timestamp is already included with flash data. However, there is a column called pulseId that tracks the id of the pulse in a train. This can be easily used to filter out a table indexed only on pulses, i.e. FEL shots
We can provide a notebook with said idea of filtering, or it can be part of the workflow within the context of workflow manager (makes more sense).
a notebook or example of the exact use case for the required feature is always greatly appreciated!
If I remember correctly, the most use of the per-pulse dataframe consisted in binning 1D traces in parallel to the final binning array to use as normalization arrays. This should be easy to implement in the workflow-manager/processor class.
I have thought about this a bit. In principle, maintaining two dataframes seems a bit redundant and a good design choice. On the other hand, deriving e.g. a "time per electron" column out of time stamps or bunch numbers probably creates a substantial amount of overhead, so maybe the former approach might perform much better...
I am also not sure that generating a per-pulse dataframe from the per-electron dataframe is efficient. In addition, it will be necessary to remove duplicates, which is likely slow. Also, I guess this is a functionality which ideally does not require the user to access the ("private") dataframe directly. I think this should be a functionality accessible from the processor class.
About normalization: Sometimes we will need to just normalize against the number of EL pulses, sometimes it will be necessary to normalize against pulse energy measurement devices (the normalization is then also done per bin).
From this point-of-view, I still think the most flexible and simplest solution will be to use two config files and generate a per-electron and per-pulse processor.
There is an additional reason why just indexing out a new table is not perfect: If there is an FEL pulse which did not generate electrons on the detector, it will not show up on the table organized by detected electrons. I agree that this is often a small error, but still...
Normalization may not be essential in a laboratory setup, but is very important at the FEL.
The easiest would be to add a way to tell the reader that we want a processor with a table by FEL pulse (instead by electron). This processor can be used for normalization.
PR #116 implements normalization histograms both from a timed (aka per shot) dataframe, or alternatively from a timestamp column. Regarding the last reason you bring: Even shots that don't provide electrons should be propperly normalized for if timestamps are correctly applied, because then the next electron will just get assigned a longer time.
@zainsohail04 This could now be implemented for the FLASH loader in PR#116
We need a possibility to normalize the binned signal by the number of FEL pulses or the intensity monitor of the free electron laser.
Preferred solution: Please add a way to specify that the SED processor can be operated in the per-electron mode (standard), or a new "per-pulse"-mode. This way, we could instantiate a second processor in per-pulse-mode, which can be used for normalization.
We also considered to include an additional 'per pulse'-dataframe in SED, but this would significantly alter the structure of the code and will reduce the performance if normalization is not needed.