Closed dhickin closed 9 years ago
Seems pretty good to me, although I don't know the intricacies of the original code. Is it normal to modify past-dated versions of the documentation as well as the undated files?
The normal, correct procedure is to create a new dated version, often make several modifications, then at some point "publish" by copying to the dated version, but we haven’t always done this.
However the API was merged I think by mistake and this pull request would replace it if merged. Also the documentation was broken as getAs returns a reference, but pointer syntax is used and references tested against null. So I took a pragmatic view and regarded this as a fix without retaining the docs for the getAs API.
Hi Dave, I like the API very much.
Just a typo I think in your reply to Andrew on documentation publishing, you meant publish by copying the dated to the undated version [after first adjusting its “Previous Version" link].
then at some point "publish" by copying to the _un_dated version
For readers of the mail list, we do that so all versions of documentation are immediately available on the web. This is the normal practice of standards bodies, where not only the present version but many previous versions of implementations and their documentation must be readily available, and where documents relating to a project don’t necessarily track rigorously with implementations.
Greg
On Jul 13, 2015, at 11:06 AM, dhickin notifications@github.com wrote:
The normal, correct procedure is to create a new dated version, often make several modifications, then at some point "publish" by copying to the dated version, but we haven’t always done this.
However the API was merged I think by mistake and this pull request would replace it if merged. Also the documentation was broken as getAs returns a reference, but pointer syntax is used and references tested against null. So I took a pragmatic view and regarded this as a fix without retaining the docs for the getAs API.
— Reply to this email directly or view it on GitHub.
As part of implementing the pull request, I looked in some of the aspects of performance. In particular the shared pointers.
@mdavidsaver was concerned about the cost of returning these, in particular when used to access StructureArrays.
It's a reasonable concern as there is a cost associated with them essentially because modifying the reference counts is done atomically.
I've tried comparing the performance of accessing subfields using shared_pointers and using references, in a number of ways.
I've done some crude testing with the time command as well as using perf (and a bit of cachegrind).
For accessing a field of a structure, on one of our servers it takes about 50 ns to get the value field of a structure if it's the first field. If you return a shared pointer this is about 70ns, so an overhead of about 20ns. If you access a higher index subfield then this will obviously take longer, as does accessing subfields deeper in the structure or if the field names are similar. In each case the overhead is of the order of 20ns. On my desktop PC it's about 40ns.
Of course in a more realistic example and on a multi-threaded system it will be a bit more complicated and the impact may be greater.
For accessing a structure I tried setting the values of a NTAttribute storing a double value. This takes under 1μs (about 880ns) of which about 100 ns is due to shared pointer overhead. Creating a new attribute and setting and storing a new value took about 3μs. So, for example, to add 10 attributes to images at a rate of 100 frames/s (reusing attributes) will take around 1ms of each second with 100μs of that being shared pointer overhead.
I reimplemented the channel archiver service using getAs and (just testing the RPC service request function, i.e. removing the pvAccess part) there was no measurable difference in this case (at most 1% overhead).
The performance impact of shared pointer doesn't seem significant in most case, but may be in some high performance applications with lots of getSubField accesses. The question of whether the getSubField[T] functions should return a shared pointer seems orthogonal to whether they should throw, so if getSubField returns one, so should getSubFieldT.
We may need to address the performance cost of shared pointers, but I suspect there are other places which we could target and improve the performance more.
Some brief comments on the code.
I appriciate the desire to avoid constructing error messages which will simply be discarded in favor of NULL. The uglyness of 'bool throws' is hidden from the public API, as such uglyness should be.
As a general rule I like to avoid std::ostringstream in inline method definitions. When these methods are inlined they contribute to the notorious c++ object code bloat, and at best result in longer compile time.
I still hold that having both template and non-template methods with the same name is bad design, and would like to see the non-template version deprecated. However, I consider this a seperate issue.
All in all I don't have any significant objections to this change.
We may need to address the performance cost of shared pointers, but I suspect there are other places which we could target and improve the performance more.
Would you publish you benchmark code?
As previously stated, I suspect that overhead in sub-field lookup will dominate. Of course this will also be a harder problem to solve.
API proposal to replace getAs introduced in https://github.com/epics-base/pvDataCPP/pull/4 with throwing version of getSubField called getSubFieldT as per the AI assigned to me in the EPICS V4 working group meeting 20150630.
Summary of changes:
Each of these have been done as a separate commit.
(Note the bug in getAs implementation was fixed in master first.)