getProvider params confusing and needs better error handling

@brunoseivam and I both have been burned by the following:

virtual ChannelProvider::shared_pointer getProvider(std::string const & providerName) = 0;
//i.e. line 923 pvAccess.h for reference

Intuitively, we expect providerName to be the name of the service that client is trying to monitor. This would be fine if things failed peacefully(segfault is not fun and debugger did not give us much). Even if we keep the name, there should be a better way to catch this.

Can we have a bit more: did it in fact segfault?

Sorry for the confusion. Didn't help that the word "provider" is used without context in both the method name and the documentation. We should address that in general.

It builds without errors and does give segfault. Prior to my realization and fixing of the issue, this is what the implementation looked like: https://gist.github.com/arkilic/c4e442396a3e5dfa9b50 Once I clarified what driverName and providerName are based on some example, I could fix the problem. I shared this here because for developers who are new to v4 like me(there are a few of us in nsls2), it is sometimes hard to find what is wrong, so every bit of documentation, examples, and clarification helps. If you'd like more information, please let me know. I can reproduce this issue.

The issue is of course was of course you've called getProvider with the name of the service instead of the name of a registered provider.

Note the EPICS V4 code doesn't seg fault when this happens. The function can't find a registered provider with that name and returns a null pointer, which is what it's supposed to do. This is clearly documented in both the 4.4 release and latest code:

    /**
      * Get a shared instance of the provider with the specified name.
      * @param providerName The name of the provider.
      * @return The interface for the provider or null if the provider
      * is not known.
      */
      virtual ChannelProvider::shared_pointer getProvider(std::string const &
          providerName) = 0;

When you call a function through a null pointer, however, you will get a seg fault.

Intuitively, we expect providerName to be the name of the service that client is trying to monitor.

I don't think this is intuitive or expected at all, but it's hard to anticipate every mistake a programmer might make.

Bear in mind the function is called getProvider and returns a ChannelProvider. It's also in a class called ChannelProviderRegistry which is described in the comments as "Interface for locating channel providers". ChannelProviderRegistry has a function for registering channel providers and one which returns a list of names of registered providers. ChannelProvider has a function called getProviderName(), which is implemented in each concrete ChannelProvider .

It should be therefore reasonably clear from the context that the providerName and provider refers to the channel provider, but I guess we could say Get a shared instance of the channel provider... instead.

However I do think if you're familiar with basic V4 concepts it should be obvious that the parameter providerName is the name of the ChannelProvider. I realise that for a V4 beginner this might not be the case, but there has to be some onus on the programmer to do a little research. Channel provider and Channel are the two key concepts in pvAccess and they can be found described in a number of documents and V4 talks on the website as well as in the documentation in the pvAccess module and I would expect someone coming to V4 to spend a little bit of time reading them and to look at example code.

In the latest code the documentation is actually reasonably helpful. pvAccessOverview.html in the documentation directory says:

The methods for ChannelProviderRegistry are:

getProvider

Called by both client and services to get the channelProvider.
The providerName must be the name of a registered provider.
Most clients will use either pva or ca.
Most services will use pvDatabaseCPP, which implements provider local. A service that is also a client can also use local or pvaSrv. ...

There's also some information in the Developer Guide. In particular it mentions the two providers pva and ca. This document is still under development though.

There are references to providers on the website, including on the home page, which has link to this interoperability talk.

There have been a number of talks describing providers. A google search, e.g "pvaccess channel provider", will rapidly return documents from Marty and Matej. The first for me was this doc from Matej with an example (it has a slightly older API for the factory but the call of getProvider is the same)

// get a pvAccess client provider
ChannelProvider channelProvider =
    ChannelAccessFactory.getChannelAccess() 
.getProvider("pva");

There are also plenty of examples of using the function in the code - over 60 in the C++ code, plus over 30 of the equivalent Java function.

getProvider params confusing ... even if we keep the name.

I think the name is pretty reasonable, in fact probably the best name we could choose, but suggest an alternative if you can think of one. It's really hard to understand what name would be less confusing other than "channelProviderName". The latter is a little clunky and it should be clear that the provider is a channel provider given the class is ChannelProviderRegistry and the return type is ChannelProvider. If I was going to change something I would change the doxygen rather than the parameter name.

This would be fine if things failed peacefully.

The function does fail peacefully and there is full error handling - when the search fails it returns a null pointer, as documented. If you ignore this and dereference the pointer without checking it's null, you will get a seg fault.

I understand that some people might prefer an exception, but this is to some extent a matter of taste. This idiom of returning null is used extensively through V4, so in the first instance if you want to use V4 you'll need to adapt to it. We could consider replacing returning null pointers with throwing, but doing this piecemeal on a per-function basis is likely to lead to a lot of confusion and changing things wholesale Is a lot of work. The bottom line is it perfectly possible to use the existing code to write applications, you just need to use it correctly. This is the point of the developer guide.

Hi Dave,

I do feel Arman’s pain. In this case, in pvAccessOverview.html as you pointed out, there is an example: "Most clients will use either pva or ca.”

But it’s also true that we could, in our documentation, better predict confusion and fix it before it happens. Your own documentation is brilliant in this respect, but we have others that, you know, isn’t.

Especially, I’d like javadoc/Doxygen documentation to avoid reusing the words used in the function names unless not doing is really unavoidable or ungainly.

Cheers Greg

On Aug 10, 2015, at 3:17 PM, dhickin notifications@github.com wrote:

The issue is of course was of course you've called getProvider with the name of the service instead of the name of a registered provider.

Note the EPICS V4 code doesn't seg fault when this happens. The function can't find a registered provider with that name and returns a null pointer, which is what it's supposed to do. This is clearly documented in both the 4.4 release and latest code:
/**
  * Get a shared instance of the provider with the specified name.
  * @param providerName The name of the provider.
  * @return The interface for the provider or null if the provider
  * is not known.
  */
  virtual ChannelProvider::shared_pointer getProvider(std::string const &
      providerName) = 0;
When you call a function through a null pointer, however, you will get a seg fault.

Intuitively, we expect providerName to be the name of the service that client is trying to monitor.

I don't think this is intuitive or expected at all, but it's hard to anticipate every mistake a programmer might make.

Bear in mind the function is called getProvider and returns a ChannelProvider. It's also in a class called ChannelProviderRegistry which is described in the comments as "Interface for locating channel providers". ChannelProviderRegistry has a function for registering channel providers and one which returns a list of names of registered providers. ChannelProvider has a function called getProviderName(), which is implemented in each concrete ChannelProvider .

It should be therefore reasonably clear from the context that the providerName and provider refers to the channel provider, but I guess we could say Get a shared instance of the channel provider... instead.

However I do think if you're familiar with basic V4 concepts it should be obvious that the parameter providerName is the name of the ChannelProvider. I realise that for a V4 beginner this might not be the case, but there has to be some onus on the programmer to do a little research. Channel provider and Channel are the two key concepts in pvAccess and they can be found described in a number of documents and V4 talks on the website as well as in the documentation in the pvAccess module and I would expect someone coming to V4 to spend a little bit of time reading them and to look at example code.

In the latest code the documentation is actually reasonably helpful. pvAccessOverview.html in the documentation directory says:

The methods for ChannelProviderRegistry are:

getProvider Called by both client and services to get the channelProvider. The providerName must be the name of a registered provider. Most clients will use either pva or ca. Most services will use pvDatabaseCPP, which implements provider local. A service that is also a client can also use local or pvaSrv. ... There's also some information in the Developer Guide. In particular it mentions the two providers pva and ca. This document is still under development though.

There are references to providers on the website, including on the home page, which has link to this interoperability talk.

There have been a number of talks describing providers. A google search, e.g "pvaccess channel provider", will rapidly return documents from Marty and Matej. The first for me was this doc from Matej with an example (it has a slightly older API for the factory but the call of getProvider is the same)

// get a pvAccess client provider ChannelProvider channelProvider = ChannelAccessFactory.getChannelAccess() .getProvider("pva");

There are also plenty of examples of using the function in the code - over 60 in the C++ code, plus over 30 of the equivalent Java function.

getProvider params confusing ... even if we keep the name.

I think the name is pretty reasonable, in fact probably the best name we could choose, but suggest an alternative if you can think of one. It's really hard to understand what name would be less confusing other than "channelProviderName". The latter is a little clunky and it should be clear that the provider is a channel provider given the class is ChannelProviderRegistry and the return type is ChannelProvider. If I was going to change something I would change the doxygen rather than the parameter name.

This would be fine if things failed peacefully.

The function does fail peacefully and there is full error handling - when the search fails it returns a null pointer, as documented. If you ignore this and dereference the pointer without checking it's null, you will get a seg fault.

I understand that some people might prefer an exception, but this is to some extent a matter of taste. This idiom of returning null is used extensively through V4, so in the first instance if you want to use V4 you'll need to adapt to it. We could consider replacing returning null pointers with throwing, but doing this piecemeal on a per-function basis is likely to lead to a lot of confusion and changing things wholesale Is a lot of work. The bottom line is it perfectly possible to use the existing code to write applications, you just need to use it correctly. This is the point of the deloper guide.

— Reply to this email directly or view it on GitHub.

I don't have problem with the v4 design patterns, I already implemented majority of my service in v4 with tons of nullptr(s). As I said earlier, the documentation DID help me fix the issue but as you said I like exceptions. It was just a suggestion that would avoid further issues since nsls2 is 2/2 in terms of getting burned by the exact same thing.

@dhickin I think the confusion was about what a provider is. The documentation is scattered.

You pointed to two talks (with older API's) and the comments in a header file. If you go to the official EPICSv4 documentation page, you get a long list of links. I followed the docs for pvAccess, for example. The the latest version has question marks and the 4.4 version says THIS IS NOT WRITTEN in capital bold letters and recommends reading the Java (!) docs.

The Developer's Guide says:

channelProvider Code that provides access to a pvData data source. It creates channels where a channel provides access to pvData structure that has an associated channelName.

Here, I think it would be perfectly reasonable for a beginner to think that a server is a provider. Later on:

getProvider This gets the requested provider.Two providers are registered automatically: pva, which uses the pvAccess network protocol, and ca, which uses the Channel Access network protocol.

Now here it says that there are two providers that use pvAccess and Channel Access. So, is a provider a concrete source of PV's or a handle to a protocol that transports PV's?

There is also the pvAccess Overview that you linked to and provided an excerpt from. I couldn't find a link to this document on the Documentation and Literature page. The Channel Providers implemented by pvAccess section of the overview states:

Client Side

pva network protocal This connects the client to a server via the pva network protocol, which is a protocol for passing pvData objects. The protocol is described in: pvAccess Protocol Specification ca network protocol This connects the client to a server via the ca network protocal, i. e. it connects to an existing V3 IOC. This is client side only code. It transforms data between pvData and ca DBR data, The ca protocol is described in: Channel Access Reference Manual codec Matej please explain

The client side of pva also allows an arbitrary number of additional providers to register with it.

Which reinforces the idea that a provider is a mean to access a protocol. So one has to look at a few places to understand what is going on. In various places the v4 documentation seems to be an afterthought, a rough description of the API calls, and a lot of the documents are marked as 'Drafts'.

I think the takeaway is that v4 is a very complex piece of software. Being complex, however, doesn't mean it should be complicated. If a library seems overly difficult, people will ditch it in favor of an easier one. At SRI, for example, there was a presentation which mentioned a transport plugin for areaDetector that moved NDArrays between instances. The guy used ZeroMQ to implement it. I asked why not use v4, and he replied: because it is too complicated. It is hard to familiarize yourself with v4 if you don't actually dive into the code; ZeroMQ, on the other hand, has a book-like document that is actually fun to read. This contrast is more stark if you take into account that ZeroMQ is a transport layer, unaware of the contents of the messages whereas v4 has a specific datatype made to transport areaDetector arrays.

Hi Bruno,

Yeup, EPICS V4 is complicated for a beginner, and the documentation is incomplete.

But, 2 things.

First, it’s getting a lot better. The Developers Guide is good, and should be integrated with the Getting Started soon, and will form a good jumping off point.

Second, bear in mind EPICS V4 is a pretty tall tool stack. It goes from protocol to high level scientific data API. And it’s via an introspection interface

which is a complicated idea for beginners in itself - that also must provide enough programmer control for creating optimally performant applications.

I’ll add the pvAccessOverview to the literature page.

Lastly, by all means join up and help fix it. Please do contribute and edit documentation through the git pull-request model.

Cheers Greg

[1] http://epics-pvdata.sourceforge.net/informative/developerGuide/developerGuide.html

On Aug 11, 2015, at 2:18 PM, Bruno Martins notifications@github.com wrote:

@dhickin I think the confusion was about what a provider is. The documentation is scattered.

You pointed to two talks (with older API's) and the comments in a header file. If you go to the official EPICSv4 documentation page, you get a long list of links. I followed the docs for pvAccess, for example. The the latest version has question marks and the 4.4 version says THIS IS NOT WRITTEN in capital bold letters and recommends reading the Java (!) docs.

The Developer's Guide says:

channelProvider Code that provides access to a pvData data source. It creates channels where a channel provides access to pvData structure that has an associated channelName.

Here, I think it would be perfectly reasonable for a beginner to think that a server is a provider. Later on:

getProvider This gets the requested provider.Two providers are registered automatically: pva, which uses the pvAccess network protocol, and ca, which uses the Channel Access network protocol.

Now here it says that there are two providers that use pvAccess and Channel Access. So, is a provider a concrete source of PV's or a handle to a protocol that transports PV's?

There is also the pvAccess Overview that you linked to and provided an excerpt from. I couldn't find a link to this document on the Documentation and Literature page. The Channel Providers implemented by pvAccess section of the overview states:

Client Side

pva network protocal This connects the client to a server via the pva network protocol, which is a protocol for passing pvData objects. The protocol is described in: pvAccess Protocol Specification ca network protocol This connects the client to a server via the ca network protocal, i. e. it connects to an existing V3 IOC. This is client side only code. It transforms data between pvData and ca DBR data, The ca protocol is described in: Channel Access Reference Manual codec Matej please explain

The client side of pva also allows an arbitrary number of additional providers to register with it.

Which reinforces the idea that a provider is a mean to access a protocol. So one has to look at a few places to understand what is going on. In various places the v4 documentation seems to be an afterthought, a rough description of the API calls, and a lot of the documents are marked as 'Drafts'.

I think the takeaway is that v4 is a very complex piece of software. Being complex, however, doesn't mean it should be complicated. If a library seems overly difficult, people will ditch it in favor of an easier one. At SRI, for example, there was a presentation which mentioned a transport plugin for areaDetector that moved NDArrays between instances. The guy used ZeroMQ to implement it. I asked why not use v4, and he replied: because it is too complicated. It is hard to familiarize yourself with v4 if you don't actually dive into the code; ZeroMQ, on the other hand, has a book-like document that is actually fun to read. This contrast is more stark if you take into account that ZeroMQ is a transport layer, unaware of the contents of the messages whereas v4 has a specific datatype made to transport areaDetector arrays.

— Reply to this email directly or view it on GitHub.

Hi Bruno

You make some reasonable points.

First, the confusion in the meaning of pvAccess is a fair one. It's used for the pvAccess API as well as for the provider and network protocol. The usage is already fairly ingrained though and so not so easy to fix this. I try to distinguish the uses carefully - pvAccess API, pvAccess provider, pvAccess network protocol - to lessen the confusion. I will try to explicitly explain the uses of the term in the documentation.

You're right also that documentation is patchy and sometimes not written. However documentation is really hard. It's hard to get right and is very time-consuming. The situation is getting better but it has taken a huge effort to get to this point.

You mention ZeroMQ. Note that:

V4 has been largely developed by Marty and Matej, with contributions from a small number of developers to the V4 s/w modules and documentation, in addition to their other responsibilities. I think about there's about 16 of us in total who've committed to our repos. In contrast, the ZeroMQ core engine repo on GitHub by itself lists 95 authors and 25 credits. I make it that the GitHub repo has had commits from around 270 developers and the Acknowledgements to the guide lists 98 people. If you include all the other modules the number of developers is well over 500. Bear this in mind when you compare the projects. ZeroMQ will always have wider applicability and a larger base of contributors. However the flip-side is that EPICS is tailored to a specific audience of large experiment control and data acquisition engineers.
Comparing V4 and ZeroMQ is not comparing like with like. ZeroMQ is a socket-like library for async messaging. As the ZeroMQ documentation says ZeroMQ doesn't know anything about the data you send except its size in bytes. That means you are responsible for formatting it safely so that applications can read it back.. With pvData and pvAccess you get a system of data types, serialisation, pvAccess requests, beacons, and standard data types, including enums, timestamps and alarms. Plus pvaSrv connects to an EPICS database. If you use ZeroMQ you still have to solve these problems. You can't download ZeroMQ and use it to get the alarm fields of an IOC database record. You can with V4.
pvAccess has been designed to allow ZeroMQ to be used for the transport - this is the point of the codec refactor.
I don't think it's really the case that V4 is too complicated or harder than ZeroMQ. Look at the HelloWorld example and tell me it's too complicated.
ZeroMQ, on the other hand, has a book-like document that is actually fun to read.Again a book like this requires a lot of effort. Whether you like the style is is matter of opinion. I think it's generally well-written and it's easy to read. However I probably won't be writing stuff like this in any V4 documentation:
We took a normal TCP socket, injected it with a mix of radioactive isotopes stolen from a secret Soviet atomic research project, bombarded it with 1950-era cosmic rays, and put it into the hands of a drug-addled comic book author with a badly-disguised fetish for bulging muscles clad in spandex.

@gregoryraymondwhite That's good to hear! A unified Developer's Guide with Getting Started, written in prose would be a great starting point.

@dhickin I agree with most of your points.

I know that ZeroMQ and v4 are two different beasts, but I brought it up to illustrate two points:

It has very good documentation. v4's documentation doesn't have to be a book or be cheesy, but should be clear and easy to read.
I heard from one person from the scientific community that chose ZeroMQ over v4 for a use case that falls squarely on v4's capabilities, because he thought v4 was complicated. Maybe there are others, and if we are to succeed in our push for v4, we need to address this.

Yes, writing documentation is hard. But I think it is as important as the code.

Resolution: better/more unified V4 documentation.

epics-base / pvAccessCPP

getProvider params confusing and needs better error handling #11