arbor-sim / arbor

The Arbor multi-compartment neural network simulation library.
https://arbor-sim.org
BSD 3-Clause "New" or "Revised" License
110 stars 61 forks source link

Improved object printers (__str__, operator<<(ostream& os, const Type& type)) #952

Open akuesters opened 4 years ago

akuesters commented 4 years ago

In order to verify models, it's helpful to query object for a bit of feedback on their state/initialization. For that, we have printers in place, but they would benefit from some revision: some don't print anything but their name, C++ and Python feedback on the same object don't match. Therefore, I'd like to go over all of them:

Implementing __repr__ should be done once we're able to fully represent objects (arblang?) and specifically test that it works.

Update: While __repr__ is supposed to be evalable, it does not have to recreate the object. According to the docs, the current <arbor.blah> are OK, if a bit useless.

Helveg commented 4 years ago

Should they be? __repr__ serves to generate a string that can represent the object when passed to eval(), while __str__ simply converts it to a string. This is best illustrated by this example:

>>> repr('r')
"'r'"
>>> str('r')
'r'
brenthuisman commented 3 years ago

Are there any cases in which we want a evalable string representation? Off hand I can only think of cable_cell and components, label_dict. I'd be in favor of not implementing/removing __repr__ for all types except those.

brenthuisman commented 3 years ago

We should not unify __repr__ and __str__, because they are meant for different purposes: the former must provide a full description of the object such that it can be recreated in full if passed to eval(). __str__ is a way to show some info about the object instance. After our groupmeet on march 16, I'm updating this issue with the following:

Helveg commented 3 years ago

Important note: Make it clear that it is still a print of an object, and make them unique even when their values used to inform you are the same! You can use Python's id function and follow the root objects class' __str__ implementation to print strings like:

>>> print(r1)
<arbor.recipe with 15 cells at 0x0045e44>
>>> print(r2)
<arbor.recipe with 15 cells at 0x0ae9800>

(Also interesting: type(obj).__name__, type(obj).__qualname__, type(obj).__module__)

Otherwise it gets very confusing very quickly when debugging

brenthuisman commented 3 years ago

Good point, I hadn't thought of that, even though I've run into this countless times ;) To avoid duplication, I'm probably going to pass &obj to __str__.

brenthuisman commented 3 years ago

We have arbor/util/strprintf.hpp and python/strprintf.hpp. The latter seems an expansion of the former. I think it would be nice to remove this duplication.

Even nicer would be to use a library like https://github.com/fmtlib/fmt. Is there anything not covered in this lib? strprintf.hpp is a bit underdocumented and I'm struggling to find out what uses it has. f"{}" like functionality, but maybe more.

halfflat commented 3 years ago

Historically, we haven't used fmt because it was an external dependency which we thought we didn't need, and besides, we'll essentially get it for free with C++20. That said, as far as external dependencies go, I understand that fmt is exceptionally well behaved.

Another part of the problem though is that string formatting is something done very rarely in the Arbor core library — and if certain parties get their way (ahem), will become even more rare. It's predominantly used for string representations of objects that are in turn used in tracing or debugging contexts.

The open issue on the topic (#514) highlights the problem of dependencies. On one hand, duplication should be eschewed, but on the other, we don't want unnecessary coupling between Arbor components.

Some of the formatting stuff included in Arbor proper is there because it came with code from other projects that was glommed in early on. We almost certainly wouldn't do things that way if we were to add that code today.

In short, solving this cleanly is tricky, and also has been low priority, and consequently has not been properly addressed.

For Python, I think the best bet is to ignore what the C++ lib does or doesn't do for now, because there is no real consistency in the C++ side, and it's not yet clear how to best achieve it. And then a good set of Python representations might inform what the C++ lib should do in the future.

bcumming commented 3 years ago

I am in favor of adding fmt as a dependency. It is a very solid library, small footprint, and certainly better than our roll-your-own solution.

halfflat commented 3 years ago

Re fmt though, let's move that discussion though to a different issue (such as #514). I can see that fmt is good, but I'm less convinced that we need it.

brenthuisman commented 3 years ago

@halfflat Keeping C++ printers separate from Python printers would go against the plan to wrap the C++ printer for Python wherever possible, right? The idea that the PR addressing this issue is going to remove the inconsistencies.

Also, since the PR will touch every class, addressing #514 by introducing libfmt in one go would make sense to me?

halfflat commented 3 years ago

We might not even have a standard C++ printed representation for these objects, and if we do, their purpose could be different to that of the Python __str__. It is not plain to me that the Python output should wrap what the C++ emits to e.g. a std::ostream. And I don't even know how this might or should be affected by locale or encoding.

I don't want the Python __str__ strings to drive a bunch of possibly invasive changes in the C++ lib when they could be their own thing. We can always revisit this in the future: it would be more net work, but in the meantime, the Python lib would be producing useful output.

thorstenhater commented 3 years ago

@fmt ... I am using it basically for every project I start (along with spdlog). It does not quite reach f"{}" levels, but it's pretty amazing (eg no f"{x + 42:>10.2f}"). Especially modcc would be profitting.

brenthuisman commented 3 years ago

@halfflat Then there has been a misunderstanding: I thought we had agreed to precisely this in the meeting. A quick example:

https://github.com/arbor-sim/arbor/pull/1434

What are the arguments for having these separate?

halfflat commented 3 years ago

You might note the question mark in those minutes — it was not agreed to!

halfflat commented 3 years ago

And the reasons are exactly those mentioned here, and raised in the meeting: we don't have consistent representations C++ side, and different conversions are used for different purposes. Let's please sort that out first, before cojoining the C++ and Python formatting.

halfflat commented 3 years ago

For Python there is also the (IMO very reasonable) option of using the Python print() as exposed by pybind11.

brenthuisman commented 3 years ago

The sentence with question mark was a summary, and I do not recall opposition. But let's move on to actual blockers.

Not all of the API is consistent between languages, certainly. But a lot is. The shims on the schedulers for instance don't change the functionality, so also not the feedback a user might want.

What exactly should be sorted out you think?

halfflat commented 3 years ago

I'm claiming that text representations on the C++ side is, essentially, a mess, and shouldn't form a basis for the Python representations without some sort of reform at the least. Also: the C++ forms and Python forms do not align 1-to-1, regardless. Using fmt or not for the C++ side is a separate issue.

I propose that in the interests of getting a better Python interface, we just write the __str__ content we want to see, and also propose that we look at using the Python print(), as exposed by pybind11, to help implement that and avoid the duplication of strprintf code.

halfflat commented 3 years ago

To elaborate on the C++ side, there is a mix of representations: some are used just for debugging; some are used to provide a summary of an object state; some are roundtrippable. These representations are mostly (all?) given by overloads of std::ostream::operartor<<, but for types that are aliases for containers of other objects, e.g. mcable_list we are providing an overload for a type (std::vector) that is not even our own.

To sort this out properly would require, in my view, that these different purposes are teased apart, and that we don't implement them all as operator<< overloads. We could consider, for example, that we include debug/trace representations as something that is only enabled when we configure the library so. But we should also question whether we want all these string and iostream representations floating about in the core library at all.

brenthuisman commented 3 years ago

In my view and experience, a piece of scientific software for simulating a model is essentially always used for one offs: until the final run every run is a test. As such, I expect I may need to query any object at any time, so I like to not have to switch to a debug build for these basics. Which does not mean a debug/trace build could provide much more feedback if the user really needs to dig in, but very often that is not required. Basics are verifying that parameters are set, and, as @Helveg points out, an indication the object is properly initialized and where, to help verify that I have not accidentally mis-initialized/assigned a (number of) objects. Also in C++ land, this kind of feedback is helpful.

A verbose flag (see other issue) would provide runtime feedback such as how many spikes this poisson generator actually generates, but that's left out of scope here.

So, in principle, I consider being able to print or stream an object for some basic feedback core functionality, and as such permissible in the core library. How we actually implement that (stream operator, a custom print function, a print/format/parse method on all classes) I don't have a strong preference about, but the purpose is, in my mind, clear: simple to use methods for scientific users to perform quick checks that provide feedback while building larger simulations.

The libfmt discussion is related tangentially: you're right that going over all objects is a bit of work, which is why I'd like to agree on using something that I'm sure is what we'll eventually be using anyway to avoid have to redo it later.

brenthuisman commented 3 years ago

I see enthusiasm for the idea, and I want to start on it. The main counter by @halfflat is that operator<< may be used for other purposes entirely than providing a summary. If I summarized that correctly, I propose that, like how Python separates these concerns by having both __repr__ and __str__, we come up with a similar convention for our C++ API.

Proposal: __repr__ : operator<< __str__ : str()

I will leave any operator<< as is, and use libfmt when writing any str() for objects in include/, which will be used by defining __str__ wherever possible.

thorstenhater commented 3 years ago

I am a bit late to the party, but here my 5c.

  1. Having the ability to dump object state is certainly useful for debugging, tracing/logging, and for the python interface.
  2. Using << is probably not a good idea for the reasons cited above.
  3. If we start working on this, we should solve it properly.

For me, that implies coming up with an API that separates the different purposes and applying it consistently over all of Arbor. I'll give an example here

template<typename T> sd::string print_debug(const T&) { ... } // Recursively print members
template<typename T> sd::string print_trace(const T&) { ... }   // Print object id (name and 0xthis?)
template<typename T> sd::string print_sexp(const T&) { ... }    // Recursively print sexp members
template<typename T> sd::string print_json(const T&) { ... }    // ...
brenthuisman commented 3 years ago

I looked into that, but I had to conclude you can only print out e.g. structs that way with the aid of a reflection library, which seems a too heavy a method. Maybe my research was insufficient? Of course, I can write template specializations, but then diverging the printer from the object might become too likely.

thorstenhater commented 3 years ago

Yes, reflection would be nice and handy, but we will have to wait until (at least) '23 with that. The danger of divergence is there, correct, but that would be true with operator<<, too. We can keep object and printing close by and overload on an ad-hoc basis, if needed, copy from std::to_string for an example.

halfflat commented 3 years ago

So, based on this issue and a convo with Brent, have made #1451. With that covering the C++ side, I propose that we repurpose this issue to focus on the Python requirements, which are much less contentious, and which can inform C++ representations as or if we provide them.