DraTeots commented 4 years ago

My use case is collecting some final statistics or scores of a Factory/Processor running in many threads. In particular, I have a smearing factory and would like to have a final report of how many particles gone through smearing, what types are blocked, etc. For this each copy of a factory gather its statistics and it should be merged in the end.

Generalizing it, I need something like the very first Init and the very last Finish. I will call them FirstInit and LastFinish

With Processors the diagram of calls:

MyProc::FirstInit()

MyProc-thread1::Init()   |
...                      |  Guaranteed to be sequential
MyProc-threadN::Init()   |

MyProc-thread1 ... MyProc-threadN    | Multithreaded processing

MyProc-thread1::Finish()   |
...                        |  Guaranteed to be sequential
MyProc-threadN::Finish()   |

MyProc::LastFinish() ??

So it is easy to achieve the FirstInit functionality, but how about LastFinish()?

For factories there is no Finish function, so one can only use the destructor.

MyFactory::FirstInit()

MyFactory-thread1::Init()   |
...                         |  Guaranteed to be sequential
MyFactory-threadN::Init()   |

MyFactory-thread1 ... MyProc-threadN    | Multithreaded processing

~MyFactory-thread1() ... ~MyFactory-threadN()    | Multithreaded death =)

MyFactory::LastFinish() ??

Since it is guaranteed that Services are created the first and destroyed the last, one can use a service to do the thing:

StatisticsService::Init()

MyFactory-thread1::Init()   |  Get<StatisticsService>
...                         |  Guaranteed to be sequential
MyFactory-threadN::Init()   |

MyFactory-thread1 ... MyProc-threadN    | Multithreaded processing

~MyFactory-thread1()  | StatisticsService <- Merge statistics
...                   | locks are used to sync with StatisticsService
~MyFactory-threadN()  | feel nervous about locks in a destructor

~StatisticsService()  | print the statistics

Here, I'm not so sure, that StatisticsService destruction is the last.

It should work, but I feel like using the wrong instruments to achieve the goal.

nathanwbrei commented 4 years ago

JEventProcessor::Init and JEventProcessor::Finish will only be called once, so they already give you your FirstInit and LastFinish. JFactory isn't capable of knowing that information. Remember, JEventProcessor interacts with the entire event stream, whereas JFactories do not. To use a fluid mechanics analogy, JFactories have a Lagrangian reference frame whereas JEventProcessors have an Eulerian one.

Side note: In practice JFactories 'see' multiple JEvents because we recycle JEvents (but even this is an assumption you really shouldn't make, because any user can change this in the settings) As a result, legacy JANA code likes to have JFactories accumulate state as well because people think it will be more efficient. This is used for calibrations and translation tables, so the state is usually blown away and recreated in JFactory::ChangeRun(). But be aware that a JFactory will only see a random subset of all emitted events.

DraTeots commented 4 years ago

Thanks for the explanation!

Still... How should I fill the final statistics? What are the right tools for that?

faustus123 commented 4 years ago

If I'm understanding Nathan's comment correctly I think the suggestion is to use a JEventProcessor for this. Something like:

Initialize statistics counters

MyProcessor::Init(){...}

Add statistics for all events

MyProcessor::Process(){ auto fac = jevent->GetFactory(); std::lock_guard lck(myMutex); stats += fac->GetEventStats(); }

Print results

MyProcessor::Finish(){...}

Note that this does require that MyProcessor have a mutex member that it locks. It's possible we could add a ProcessSequential method to JEventProcessor as well that locks an internal mutex automatically. The problem there is that we may have to lock/unlock it for every JEventProcessor, even ones that don't implement a ProcessSequential. Nathan may want to comment.

faustus123 commented 4 years ago

Thinking about it a little more, it would actually be easy to avoid a lock/unlock sequence for every event in JEventProcessors that don't implement a ProcessSequential method. Just have the default method set a private flag. We'd only suffer the lock/unlock for the first event and a if(bool) call for all others.

nathanwbrei commented 4 years ago

Yes, though in my current version the user just sets a flag in the ctor, which I find preferable because it is less magical. If we really wanted to have fun we can detect whether the method has been overridden by testing equality of function pointers, although we will have to be careful to make sure that the implementation works on all of the outdated compilers we are trying to support.

faustus123 commented 4 years ago

I looked into testing equality of function pointers a little and it does not seem to be part of a global standard. I saw there is some GCC extension that supports it that seems to be honored by a couple of other compilers (not MS). There is no guarantee that will be the case in the future though.

What I don't like about the user setting a flag is that it is very open to user error. It they set it or don't set it in a way that does not match whether they have implemented the virtual method, we will have no way of detecting the error and they will be extremely frustrated when trying to figure out why their code does not work. I prefer the "magical" way since it would do what the user expects without allowing for an inconsistent configuration.

nathanwbrei commented 4 years ago

Looks somewhat standardized to me: https://isocpp.org/wiki/faq/pointers-to-members

faustus123 commented 4 years ago

The article I read was on StackOverflow (see answer 2) https://stackoverflow.com/questions/4741271/ways-to-detect-whether-a-c-virtual-function-has-been-redefined-in-a-derived-cl

It claims that the .* syntax is a gcc extension as explained here: http://gcc.gnu.org/onlinedocs/gcc/Bound-member-functions.html

If this is standard, then great. That solves the issue. (Sans verification that it works across plugin boundaries.)

nathanwbrei commented 4 years ago

Read about and played around some more with pointers-to-member-functions. Slightly counterintuitively, it so happens that &Base::f == &Derived::f == &OtherDerived::f regardless of what overrides of f actually exist. Essentially a PMF is a symbol which resolves to an index in the vtable, so technically neither a pointer nor a function. This weirdness is probably necessary to avoid putting constraints on the vtable and the convention for passing the 'self' parameter. It looks like Clang, GCC, and ICC all support the extension needed to get at the function pointer behind the member pointer, so in practice maybe we could cast the PMF to void* like the StackOverflow people are doing. However, that is certainly not my first choice.

nathanwbrei commented 4 years ago

Here's a more exciting option:

#include <iostream>

// The thing the user is supposed to extend
// NOTE: JEventSource ought to be a Concept (from C++20), which obviously none of our compilers will support.
// For ergonomics we can make the base class be runtime-polymorphic but use static polymorphism behind the scenes.
class JEventSource {
public:
    virtual void process_parallel() {
        std::cout << "Base process_parallel" << std::endl;
    };
    virtual void process_sequential() {
        std::cout << "Base process_sequential" << std::endl;
    };
};

// The JEventSource backend. The logic for preserving invariants lives here.
// If the user declares their virtual functions final, the optimizer can devirtualize entirely.
template <typename ES>
class JAbstractEventSource {
public:
    ES* underlying;
    JAbstractEventSource(ES* es) : underlying(es) {};
    void do_process() {
        if (typeid(&ES::process_parallel) != typeid(&JEventSource::process_parallel)) {
            underlying->process_parallel();
        }
        if (typeid(&ES::process_sequential) != typeid(&JEventSource::process_sequential)) {
            underlying->process_sequential();
        }
    }
};

// A user-defined JEventSource. Note that they only implement process_parallel and the user is able to
class MyEventSource : public JEventSource {
public:
    void process_parallel() override {
        std::cout << "Derived process_parallel" << std::endl;
    }
};

class JApplication {
public:
    template<typename JEventSourceT>
    void run(JEventSourceT* es) {
        static_assert(std::is_base_of<JEventSource, JEventSourceT>::value);
        JAbstractEventSource<JEventSourceT> aes(es);
        aes.do_process();
    }
};

int main() {
    MyEventSource es;
    JApplication app;
    app.run(&es);
    return 0;
}

nathanwbrei commented 4 years ago

I still think we should just have the users set the 'has_process_sequential' flag by themselves. If it defaults to true, an uninformed user would encounter a minor performance hit with no loss of correctness. If an informed user went out of their way to turn off sequential processing when they shouldn't, they would get a correctly opened and correctly closed resource which happened to contain data from exactly zero events.

Furthermore, the uninformed users will be following our tutorials/sample code closely, and those would be telling them to use process_sequential() instead of using their own locks, so there wouldn't even be a performance hit.

faustus123 commented 4 years ago

At the risk of insulting myself (and Dmitry), you probably shouldn't give too much credit to the users to closely follow any documentation!

I think the most obvious failure mode will be someone seeing the ProcessSequential() virtual method in the header or reference documentation and implementing it without setting the flag. This will cost them time (and ultimately us in helping them figure out the error). Especially, as you say, since the code won't crash. It will just create an empty file.

I think our two choices should really be:

Change nothing, but document how a user should use a mutex in their code a'la my example above.
Add a ProcessSequential virtual method: 2a. Handle it automatically when the user implements it. or 2b. Always lock/unlock and don't try to avoid it.

Regarding 2a.: I took a look at your code above. I think I see how this works. First, I assume you meant JEventProcessor and not JEventSource. Doing it that way would mean changing the signature for the Run() and presumably the Add(JEventProcessor*) methods to be templates. This allows us to basically perform a compile time check without having to know about the subclass.

I’ll admit that it is clever and seems to give equivalent functionality to the other automatic methods we discussed. I’m not too excited about turning those methods into templates for reasons that will not be at all clear to the users. I put a lot of weight into having an API that is intuitively obvious to our user base and I this would stray from that.

Regarding 2b.: We should probably just do some performance tests. Maybe we're overthinking it.

nathanwbrei commented 4 years ago

A detailed example and writeup on how to use JMetadata to get metadata out of a JFactory and into a JEventProcessor now appears in pull request #77.

See examples/MetadataExample

JeffersonLab / JANA2

Invariant/last Finish of a Factory/Processor #74

Initialize statistics counters

Add statistics for all events

Print results