SNEWS2 / SNEWS_Publishing_Tools

Publishing Tool for SNEWS
BSD 3-Clause "New" or "Revised" License
2 stars 2 forks source link

Float timeseries #98

Closed KaraMelih closed 3 months ago

KaraMelih commented 4 months ago

Aimed at fixing https://github.com/SNEWS2/SNEWS_Publishing_Tools/issues/87 ~also fixes https://github.com/SNEWS2/SNEWS_Publishing_Tools/issues/97~ (moved this fix to https://github.com/SNEWS2/SNEWS_Publishing_Tools/pull/99)

Previously, we were expecting iso-formattable time strings for all the entries in the expected timing_series .
However, this can easily become too large. As was suggested by others, this PR implements a float conversion.

It first checks if all the values are strings or floats/integers.

Right now, the logic is missing. I do not know if we should always force initial neutrino time too, or should this be fetched from the Coincidence Tier by time tier people?

I imagine a list of time series starting with 0 is not fully useful if we do not know what the 0 refers to.

KaraMelih commented 4 months ago

Reverted the firedrill fix (ee7f062), as I added that in a minor patch

KaraMelih commented 4 months ago

@sybenzvi @mcolomermolla

I want to ask the expected information in the timing tier. Now, this PR allows for relative-to-first neutrino times with ns precision. This logic indicates that the first value is always zero, and the rest are relative to this value.

In this case, do we also want to force the initial "neutrino_time" to appear as a string (that corresponds to the zeroth time) or do we expect that people looking at the TimeTier will fetch this from the CoincidenceTier ?

Right now, if you pass

neutrino_time = "2024-01-01T12:00:00.00000000"
timing_series = [0, 119781135000, 119881135000, 179781135000, 248890124000]

the neutrino_times will be parsed and sent under CoincidenceTier message (timing series ignored) and
the timing_series will be parsed and sent under TimingTier message (neutrino time ignored).

Is this what we want, or do we want to also always see the neutrino_time in the timing tier too?

KaraMelih commented 3 months ago

I added 'neutrin_time' as a required field for the time tier messages.

Now, it also allows for either a list of strings with individual neutrino times, or a list of integers indicating the relative times from the initial neutrino time with a nanosecond precision.

In the former case, it computes these relative times from the initial neutrino time itself and creates a list of relative times. So the submitted message always contains;

habig commented 3 months ago

Have we been able to reproduce #87 ? If there's an unexpected hard limit on message size, we need to take that to scimma as a bug, since the original specs were "sure, you guys can hove anything around including big skymaps"

KaraMelih commented 3 months ago

I just tried and this fails;

import numpy as np
numbers = np.linspace(1e9, 9e9, 100000).astype(int)

tims = SNEWSMessageBuilder(detector_name='XENONnT',
                               neutrino_time='2012-06-09T15:31:08.109876',
                               timing_series=numbers.tolist(),
                               machine_time='2012-06-09T15:30:00.009876',
                               firedrill_mode=False, is_test=True)
tims.send_messages()

here is the error that it raises;

KafkaException: KafkaError{code=MSG_SIZE_TOO_LARGE,val=10,str="Unable to produce message: Broker: Message size too large"}

It was fine with 10k integers, it crashes at 100k.

habig commented 3 months ago

Great, thanks. If we convert that to bytes (since we're sending in ASCII one number is a lot of characters), what's that look like? I've got the attention of scimma devs at the moment.

habig commented 3 months ago

They're not opposed to discussing it on slack, but suggested their ticketing system, https://support.scimma.org/

justinvasel commented 3 months ago

See https://github.com/SNEWS2/SNEWS_Publishing_Tools/issues/87#issuecomment-2105083319 for a potential solution.

habig commented 3 months ago

The scimma people report that the default max message size is 1MB. They're leery to increase that to head off future scaling problems, and are working on a large file offload service. But: now knowing this we can plan for it, with efficiencies like this PR helping. Could imagine daisy-chaining things together like SMS messages that go over 140 characters, too.