Mapping TPIs to TAPs and vice versa

benkrikler commented 10 years ago

Given a TAP, it is relatively simple to draw the TPI it was created from. We can store within the TAP the ID of the underlying TPI (I'd avoid pointers which are harder to persist).

The reverse operation, getting TAPs from a TPI, is not so simple as potentially several TAPs could come from one TPI (I'm thinking pile-up or multiple generators on the same channel). In this situation, how do we think of storing this information?

Is it ever going to be useful? I think so, but do you all agree?

I can think of several ways to do it, but I'm not sure what's best:

Extend the TPI class, through inheritance (to avoid cluttering the TPI class itself, used in alcapana). Add a vector to each TPI to contain TAP IDs.
Add some mapping functionality into the event navigator. Then, when given a TPI, we can retrieve a list of TAPs.

Also important is the format of the TAP ID itself. For TPIs we're just using the offset within the list of TPIs for a given channel within each event. Can we do the same for the list of TAPs or do we expect the TAP order to be more flexible etc.

litchfld commented 10 years ago

Executive summary: TAP => TPI good, TPI => TAP bad.

Backward mapping from TAP to TPI is simple and sane, although it probably only makes sense for debugging, since we may not save the TPIs in production runs.

But forward mapping... Of the top of my head, I don't think that has ever been done on other experiments that I've worked on. It's mostly a procedural thing, generally objects are regarded as frozen once the the algorithm that makes them completes, and can only be accessed in read-only mode. There are minimally dangerous ways round that in memory: you make the forward map mutable, so the rest of the object is still const protected. But that's not a complete fix because if the object has been saved to disk you have to create a new, modified, copy. You can see how this might result in a lot of overhead.

Semantically, I'm also not convinced. The TPIs are interpretation-free representations of the data. The interpretations (TAPs) obviously should know about the data they are interpreting, but the data shouldn't care if someone has made an interpretation of it it or not.

One can always loop over every container of TAPs and build the list of forward references as needed. It's a little expensive, but you only do it if you ever need to, and if more than one of the lists is time ordered, then you can make it much faster.

litchfld commented 10 years ago

Also important is the format of the TAP ID itself. For TPIs we're just using the offset within the list of TPIs for a given channel within each event. Can we do the same for the list of TAPs or do we expect the TAP order to be more flexible etc.

I hope not! The TAP order will be fixed once it is handled to the EventNavigator. If you want to reorder them then you have to create a new container with the new ordering, and have references into that.

I think the ordering of the pulses is part of what the generator algorithms should consider as output. If the ordering in the container changes once it has been signed off by an algorithm, I would consider that bug.

benkrikler commented 10 years ago

Ok. I think the main use I had in mind for this was the ExportPulse module ( #52 ), which we can redesign I think. Other than that I can't see a reason to go from TPIs to TAPs within analysis so what you say makes sense.

Similarly for TAP ordering not changing once the generator finishes, this also makes sense and is true to my original idea. No part of the code knows better than the generator as to the correct order of the TAPs.

One final question / comment and I'll close this: Should a TPI or TAP be aware of it's ID? It would be useful for some interface design to be able to get a TAP and know it's ID without knowing the list it's stored in.

Again, I can see several ways to do this, and I'm not sure what's best. Simplest might be to replace the typedef'd containers with derived versions. We then overload push_back or insert to also set the TPI / TAP's ID.

An alternative might be to use a nominal timestamp based on the trigger time of the TPI. Within a source's list of pulses that's unique already and conceptually seems the most natural way to ID a pulse. The trouble is then moved into how you retrieve a pulse with a given ID.

benkrikler commented 10 years ago

I want to close this issue so to make progress, I'm going to try and answer my own question here. Queue the third person...

Should a TPI or TAP be aware of it's ID?

That's a great idea, Ben! It makes sense that we can inspect a given pulse and query it as to its origins.

I can see several ways to do this, and I'm not sure what's best.

Personally I like your second option. The time of the pulse is immutable and irrespective of the container or number of pulses, so it does feel more natural. It would also allow us to use TPIs without touching the class, which is nice cause it was an online data format, so we don't want to have to add stuff in now.

If you do go down that route, how about replacing use of a vector to make the standard pulse list with a std::set and define a sort operation based on the time stamp of each pulse. Then, when you want a given time, it should be very quick to find the corresponding pulse. Otherwise, if we can guarantee the vector is time sorted, std algorithm's bisect might be able to do the job just as well. Perhaps that second option is the better one...

AndrewEdmonds11 commented 10 years ago

I feel I should step in here and make a comment...

Would we not already know a pulse's origins if we've asked for a pulse created by a specific generator?

benkrikler commented 10 years ago

Yay, response!

I suppose I'm thinking about things in general, so I wasn't thinking we'd always be simultaneously told the source. What situation are you thinking about?

Even if we know the source though, how should we identify a pulse within that source's list of output pulses?

AndrewEdmonds11 commented 10 years ago

Why can't it just be the pulse's position in the list?

I'm also trying to think of what we would use it for. At the moment, I'm thinking that we ask the EventNavigator for a list of pulses (TPI, TAP or whatever) of a certain detector and using a certain generator. And so is this discussion about how we identify a specific pulse in a given list?

benkrikler commented 10 years ago

At the moment, I'm thinking that we ask the EventNavigator for a list of pulses (TPI, TAP or whatever) of a certain detector and using a certain generator.

That's definitely going to be true for most things that access the various pulses, but there may be at least one or two cases where we want to look at individual pulses and don't care of the rest of the list, so I'm trying to work out how we handle that.

Here's one thing that's bothering me: Each TAP should carry the ID of some kind to the TPI that created it, how can we guarantee that this is filled in the generator? We can't assume there's 1 TAP per TPI and so the vector indices won't in general align. If we use the timestamp, then there's no concern.

And so is this discussion about how we identify a specific pulse in a given list?

Yup, that's the point.

AndrewEdmonds11 commented 10 years ago

Ok, I understand now. Yes, I think the timestamp is probably the best bet then. Does this mean we would have to store the timestamp in the TAP as well?

benkrikler commented 10 years ago

The timestamp of the TPI that created the TAP would want storing. But it's only used to identify the parent TPI, and shouldn't be treated as meaningful to the TAP I think.

litchfld commented 10 years ago

One final question / comment and I'll close this: Should a TPI or TAP be aware of it's ID? It would be useful for some interface design to be able to get a TAP and know it's ID without knowing the list it's stored in.

OK, so at this point I think there's two different concepts floating about.

If a Pulse knows it's own ID then that is a kind of global identifier, and can be unique (a copy has a different identifier) or copied when the pulse is copied.

If a reference is held to a point in a list that is not strictly an identifier of a pulse, it's a pointer into a particular location in a fixed datastream, and it doesen't even make sense to copy it.

The latter 'pointer back' Is not strictly an ID, its a way of finding the source quickly (quicker than re-analysing the data). It answers the question "where is X". The former is a true ID, but it may not provide for fast referencing. It answers the question "is Y == X"

One way of getting most of both features is to take advantage of the fact that an Int32_t is a bit bigger than what we need to index a container, so the additional bit can be used to hash the object data (or part of it) If you look in the specified place and the hash matches then bingo!

litchfld commented 10 years ago

Other thoughts: The timestamp is not a unique identifier either... it still needs to have a source identifier, since they are generating timestamps in parallel.

Given that, I'm not sure if this buys you all much that isn't in the container+position model, but has the cost of slightly slower lookup. (Obviously in container+position you still need to store the information about the location in the source container, not assume it's the same as in your own container - It's basically just a I/O-stable version of myArray[5])

benkrikler commented 10 years ago

With pull request #88 most of this has been implemented or at least answered. so I'm closing this issue. As always, feel free to add comments afterwards.

alcap-org / AlcapDAQ

Mapping TPIs to TAPs and vice versa #53