Improper handling of empty chunk

mhliu0001 commented 8 months ago

From https://github.com/XENONnT/WFSim/blob/master/wfsim/strax_interface.py#L969, if a chunk has empty data (occurs when chunk_size or event_rate is set too small), WFSim will create chunks with start time and end time to be zeros. This will cause an exception when strax checks the continuity of chunks, like:

Simulating Raw Records:  15%|█▌        | 14/93 [00:11<01:00,  1.31it/s]Traceback (most recent call last):
  File "/home/mhliu0001/ambe_simulation/./mc_chain/run_wfsim.py", line 238, in <module>
    st.make(args.run_id, 'raw_records')
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/context.py", line 1426, in make
    for _ in self.get_iter(run_ids[0], targets,
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/context.py", line 1336, in get_iter
    generator.throw(e)
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/processor.py", line 302, in iter
    raise exc.with_traceback(traceback)
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/processor.py", line 255, in iter
    yield from final_generator
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/mailbox.py", line 447, in _read
    self.kill_from_exception(e)
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/mailbox.py", line 213, in kill_from_exception
    raise e
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/mailbox.py", line 444, in _read
    yield res
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/context.py", line 1307, in get_iter
    for n_chunks, result in enumerate(strax.continuity_check(generator), 1):
  File "/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.9/site-packages/strax/chunk.py", line 318, in continuity_check
    raise ValueError("Data is not continuous. "
ValueError: Data is not continuous. Chunk [mc_1.raw_records: 0sec 0 ns - 0sec 0 ns, 0 items, -1.0 MB/s] should have started at 215099900000

mhliu0001 commented 8 months ago

Maybe related to #385

ramirezdiego commented 8 months ago

Hi @mhliu0001, long story short of this issue is:

This is a known issue since WFSim exists and we could only bypass it by defining the chunk-size according to the particle rate.
Solving definitely this issue depends on fixing the chunking from the epix level and change how strax does this (@WenzDaniel can explain better), and is one of the reasons why fuse (https://github.com/XENONnT/fuse) exists. We do not plan to fix WFSim at this point and aim at porting and validating all the physics functionality to fuse.
It is not directly related to #385, since the issue there is that the epix data is empty (something we have also solved in fuse), while the problem here is that data, once we bring it into a continuous time format, is structured in unequal chunks of sometimes overlapping events from G4 (which were simulated independently).

mhliu0001 commented 8 months ago

Hi @ramirezdiego , thanks for the quick reply! I am raising this issue because I think there is a simple fix to this bug: just deleting the line https://github.com/XENONnT/WFSim/blob/master/wfsim/strax_interface.py#L969 and other stuff that creates a chunk with start=end=0. To be specific:

for data_type in self.provides:
    if 'nv' in data_type:
        if exist_nveto_result:
            chunk[data_type] = self.chunk(start=self.sim_nv.chunk_time_pre,
                                          end=self.sim_nv.chunk_time,
                                          data=result_nv[data_type.strip('_nv')],
                                          data_type=data_type)
        # If nv is not one of the targets just return an empty chunk
        # If there is TPC event, set TPC time for the start and end
        else:
            dummy_dtype = self._truth_dtype if 'truth' in data_type else strax.raw_record_dtype()
            if exist_tpc_result:
                chunk[data_type] = self.chunk(start=self.sim.chunk_time_pre,
                                              end=self.sim.chunk_time,
                                              data=np.array([], dtype=dummy_dtype),
                                              data_type=data_type)
            else:
                chunk[data_type] = self.chunk(start=0, end=0, data=np.array([], dtype=dummy_dtype), #here
                                              data_type=data_type)
    else:
        if exist_tpc_result:
            chunk[data_type] = self.chunk(start=self.sim.chunk_time_pre,
                                          end=self.sim.chunk_time,
                                          data=result[data_type],
                                          data_type=data_type)
        else:
            dummy_dtype = self._truth_dtype if 'truth' in data_type else strax.raw_record_dtype()
            if exist_nveto_result:
                chunk[data_type] = self.chunk(start=self.sim_nv.chunk_time_pre,
                                              end=self.sim_nv.chunk_time,
                                              data=np.array([], dtype=dummy_dtype),
                                              data_type=data_type)
            else:
                chunk[data_type] = self.chunk(start=0, end=0, data=np.array([], dtype=dummy_dtype), #here
                                              data_type=data_type)

Those two lines marked with #here actually are the root of this problem. I don't know why they were here in the first place, maybe someone know that. If these lines are redundant, we can just delete them and fix this.

I am still investigating this problem, but if this is the case, does it make sense to you to fix it before our full-chain simulation for SR1?

XENONnT / WFSim

Improper handling of empty chunk #428