Possibility for normalization by the number of FEL pulses:

OpenCOMPES / sed

Single Event Data Frame Processor: Backend to handle photoelectron resolved datastreams

https://opencompes.github.io/docs/sed/latest/

MIT License

3 stars 2 forks source link

Possibility for normalization by the number of FEL pulses: #101

Closed yacremann closed 1 year ago

yacremann commented 1 year ago

We need a possibility to normalize the binned signal by the number of FEL pulses or the intensity monitor of the free electron laser.

Preferred solution: Please add a way to specify that the SED processor can be operated in the per-electron mode (standard), or a new "per-pulse"-mode. This way, we could instantiate a second processor in per-pulse-mode, which can be used for normalization.

We also considered to include an additional 'per pulse'-dataframe in SED, but this would significantly alter the structure of the code and will reduce the performance if normalization is not needed.

rettigl commented 1 year ago

I'm not 100% sure if that works, but it appears to me that you can select the index via the config parameter dict "channels"

self.all_channels: dict = self._config.get("channels", {})

It should be straight forward to create two instances with different configs

steinnymir commented 1 year ago

@yacremann I believe you would want the old ddMicrobunches table, right? If I am not mistaken, this can be obtained from the current dataframe by droppping the electron index, and leaving only the per-pulse channels, as @rettigl correctly suggests I can look into making a simpler access to generating this, as is it indeed useful to normalize FEL intensity, but also delay stage positions (less crucial with the current laser system).

rettigl commented 1 year ago

I was also thinking about this in the context of lab experiments. In principle, such a normalization histogram could be derived from a per-electron dataframe, if every electron has a timestamp, which could be generated for the FLASH data, and is already implemented for our data.

steinnymir commented 1 year ago

timestamp is already included with flash data. However, there is a column called pulseId that tracks the id of the pulse in a train. This can be easily used to filter out a table indexed only on pulses, i.e. FEL shots

zain-sohail commented 1 year ago

We can provide a notebook with said idea of filtering, or it can be part of the workflow within the context of workflow manager (makes more sense).

steinnymir commented 1 year ago

a notebook or example of the exact use case for the required feature is always greatly appreciated!

If I remember correctly, the most use of the per-pulse dataframe consisted in binning 1D traces in parallel to the final binning array to use as normalization arrays. This should be easy to implement in the workflow-manager/processor class.

rettigl commented 1 year ago

I have thought about this a bit. In principle, maintaining two dataframes seems a bit redundant and a good design choice. On the other hand, deriving e.g. a "time per electron" column out of time stamps or bunch numbers probably creates a substantial amount of overhead, so maybe the former approach might perform much better...

yacremann commented 1 year ago

I am also not sure that generating a per-pulse dataframe from the per-electron dataframe is efficient. In addition, it will be necessary to remove duplicates, which is likely slow. Also, I guess this is a functionality which ideally does not require the user to access the ("private") dataframe directly. I think this should be a functionality accessible from the processor class.

About normalization: Sometimes we will need to just normalize against the number of EL pulses, sometimes it will be necessary to normalize against pulse energy measurement devices (the normalization is then also done per bin).

From this point-of-view, I still think the most flexible and simplest solution will be to use two config files and generate a per-electron and per-pulse processor.

yacremann commented 1 year ago

There is an additional reason why just indexing out a new table is not perfect: If there is an FEL pulse which did not generate electrons on the detector, it will not show up on the table organized by detected electrons. I agree that this is often a small error, but still...

Normalization may not be essential in a laboratory setup, but is very important at the FEL.

The easiest would be to add a way to tell the reader that we want a processor with a table by FEL pulse (instead by electron). This processor can be used for normalization.

rettigl commented 1 year ago

PR #116 implements normalization histograms both from a timed (aka per shot) dataframe, or alternatively from a timestamp column. Regarding the last reason you bring: Even shots that don't provide electrons should be propperly normalized for if timestamps are correctly applied, because then the next electron will just get assigned a longer time.

rettigl commented 1 year ago

@zainsohail04 This could now be implemented for the FLASH loader in PR#116