Tag handling in decimator blocks can cause an endless loop

See the tag-propagation-decim-issue tag in gr4-packet-modem for the exact version of the code that replicates this issue. The code is to be built with

cmake -D CMAKE_CXX_FLAGS=-DTRACE ..

to get trace prints.

I have implemented a Pack Bits block that is somewhat similar to the Pack K Bits GR3 block. In its most simple form, it packs the LSB in two input uint8_t to form a single output uint8_t that carries data in its two LSBs. The block has a decimation of 2 in this case.

I have implemented the block by using a gr::ResamplingRatio<> annotation (which defaults to isConst = false) and then setting numerator = 1 and denominator = 2 in the block constructor.

I have a simple flowgraph to test this block. It works well when input tags appear on even-indexed items. The PackBits::processBulk() function is called with appropriately sized input and output spans, and automatic tag propagation makes the tags appear in the output in the correct indices. For instance, with tags at index 0 and index 2 I get:

VectorSource::processBulk(outSpan.size() = 65536)
VectorSource::processBulk returning DONE
PackBits::processBulk(inSpan.size() = 2, outSpan.size() = 1), to_publish = 1
VectorSink::processBulk(inSpan.size() = 1)
VectorSink received tag (0, { zero: null }) at index = 0
PackBits::processBulk(inSpan.size() = 12, outSpan.size() = 6), to_publish = 6
VectorSink::processBulk(inSpan.size() = 6)
VectorSink received tag (0, { two: null }) at index = 1
vector sink contains 7 items
vector sink items:
1 3 1 0 2 3 1 
vector sink tags:
index = 0, map = { zero: null }
index = 1, map = { two: null }
Suite 'global': all tests passed (3 asserts in 0 tests)

Note that the tags appear in the output at indices 0 and 1 as expected. (Also note that the input vector has length 15, but the scheduler cleverly decides that nothing can be done with the last item of this vector because PackBits needs at least 2 items to do something, so it correctly terminates the flowgraph execution).

However, if I put tags on odd-indexed samples, the scheduler enters an infinite loop, using 100% CPU of one core. For instance, with a single tag on index 1, I only get:

VectorSource::processBulk(outSpan.size() = 65536)
VectorSource::processBulk returning DONE

and then an infinite loop.

Hey @daniestevez, thanks for the detailed report, this is due to the way the conflict between "needs at least 2 samples to process due to the denominator" and "tags are always on the first sample supplied to processBulk" is resolved.

Since it's related, i'll resolve it in #340, basic outline:

Tags are always on the first sample of a chunk of data
If this chunking violates the port's minimum required samples or the denominator constraints, it will be ignored, inSpan.tags will also contain tags on later samples
by default, tag forwarding will only forward and consume tags up to sample 0.
- the user can manually forward and consume tags in processBulk
- if they decide not to, the tags will not be consumed and end up on being forwarded and consumed on the first sample of the next chunk.
- We could add a flag to switch to forwarding all tags of a chunk to the current first sample instead of the first of the next chunk, but since that can be trivially done manually via the methods shown above and add additional complexity + api surface I would only add that once there is a usecase for it.

I'll keep you updated on the progress. Also thanks to @ivan-cukic for the initial triaging.

@wirew0rm thanks for the update!

Let me check if I understood the outline you propose with a worked example. This is actually related to https://github.com/gnuradio/gnuradio/issues/7075, so propagating tags in a decimator block in the "least surprising way" (for lack of a consensus on what is the "correct" way) is not something we always get right in GR3.

Say we have a block Integrator(2) that adds pair of samples together. Say we present to this block the input items [0, 1, 2, 3, 4, 5, 6, 7], and tags {{0, {{"a", pmtv::pmt_null()}}}, {1, {{"b", pmtv::pmt_null()}}}, {2, {{"c", pmtv::pmt_null()}}}, {4, {{"d", pmtv::pmt_null()}}}}. Say that the Integrator(2) doesn't do any tag management and relies completely on default tag forwarding by the runtime.

According to what you said, I would expect that the following happens in each processBulk() call.

The block is called with input items [0, 1] and corresponding tags {{0, {{"a", pmtv::pmt_null()}}}, {1, {{"b", pmtv::pmt_null()}}}}. The runtime only propagates the tag on the first item, so the output is the item [1] and has the tag {{"a", pmtv::pmt_null()}}.
The block is called with input items [2, 3]. Now the tag {{"b", pmtv::pmt_null()}} is present at the first item, since it has been "carried over" from the previous call. The tag {{"c", pmtv::pmt_null()}} is also present at the first items, so actually these two tags get combined as {{"b", pmtv::pmt_null()}, {"c", pmtv::pmt_null()}}. The block publishes the output item [5] and the runtime attaches this combined tag to this output.
The block is called with input items [4, 5, 6, 7] and the tag {{"d", pmtv::pmt_null()}} attached to the first of these items. It publishes the output [9, 13] with the tag {{"d", pmtv::pmt_null()}} attached to the item 9.

A block that looks at the output of this block would see the items [1, 5, 9, 13] and tags {{0, {{"a", pmtv::pmt_null()}}}, {1, {{"b", pmtv::pmt_null()}, {"c", pmtv::pmt_null()}}}, {2, {{"d", pmtv::pmt_null()}}}}. The effect of the Integrator(2) on the tag indices is in practice ceil(index / 2.0).

Is this understanding correct?

As you can see in https://github.com/gnuradio/gnuradio/issues/7075, in my opinion the "least surprising way" of propagating tags in this case would be to use floor(index / 2.0) rather than ceil(index / 2.0). This would be achieved by

We could add a flag to switch to forwarding all tags of a chunk to the current first sample instead of the first of the next chunk

So I think we there is a good case for having this feature in the runtime. Otherwise potentially several blocks would need to implement this behaviour on their own.

The specific case in which I encountered that GR3 issue (which serves to motivate why this behaviour is needed) was the following. I had a stream of items and each of them had a tag containing the timestamp corresponding to that item. I wanted to integrate blocks of N of these items and associate to each of these integrations the timestamp corresponding to the first item used in the integration (because this is the starting timestamp of the integration interval). I could do this if the tag indices are propagated as floor(index / N) , but not if they are propagated as round(index / N) (which appears to be the case in GR3) or as ceil(index / N). If they are propagated as floor(index / N), then each output item contains the tags for each item that was considered in that integration (in GR3 an item can have multiple tags, which are not merged, so there isn't a problem with having multiple {"timestamp": ...} tags with different values on the same item). Then I could have a custom block that filters out these tags, removing all of them except for the tag with minimum timestamp in each item.

Thanks for the detailed explanation and examples, looks all correct to me.

Then I could have a custom block that filters out these tags, removing all of them except for the tag with minimum timestamp in each item.

But doesn't this yield the same effect as only propagating until the first sample and then only taking the latest value for a given key in the map? Basically your expected approach takes the minimum and propagates all tags for each chunk, while the currently panned approach takes the maximum of all the leftover tags from the previous chunk and the tags on the first sample.

But doesn't this yield the same effect as only propagating until the first sample and then only taking the latest value for a given key in the map?

Yes. You're right. I didn't realize this before. I'll keep thinking about this, but taking into account your remark, at the moment I think I don't have a strong case where floor(index / N) would be needed instead of ceil(index / N).

fair-acc / gnuradio4

Tag handling in decimator blocks can cause an endless loop #341