Open akuesters opened 4 years ago
Should they be? __repr__
serves to generate a string that can represent the object when passed to eval()
, while __str__
simply converts it to a string. This is best illustrated by this example:
>>> repr('r')
"'r'"
>>> str('r')
'r'
Are there any cases in which we want a eval
able string representation? Off hand I can only think of cable_cell
and components, label_dict
. I'd be in favor of not implementing/removing __repr__
for all types except those.
We should not unify __repr__
and __str__
, because they are meant for different purposes: the former must provide a full description of the object such that it can be recreated in full if passed to eval()
. __str__
is a way to show some info about the object instance. After our groupmeet on march 16, I'm updating this issue with the following:
__str__
implementations and insure they print something informative about their class.ostream& operator<<(ostream& os, const Type& type)
for all Arbor classes and structs?Important note: Make it clear that it is still a print of an object, and make them unique even when their values used to inform you are the same! You can use Python's id
function and follow the root object
s class' __str__
implementation to print strings like:
>>> print(r1)
<arbor.recipe with 15 cells at 0x0045e44>
>>> print(r2)
<arbor.recipe with 15 cells at 0x0ae9800>
(Also interesting: type(obj).__name__
, type(obj).__qualname__
, type(obj).__module__
)
Otherwise it gets very confusing very quickly when debugging
Good point, I hadn't thought of that, even though I've run into this countless times ;) To avoid duplication, I'm probably going to pass &obj
to __str__
.
We have arbor/util/strprintf.hpp
and python/strprintf.hpp
. The latter seems an expansion of the former. I think it would be nice to remove this duplication.
Even nicer would be to use a library like https://github.com/fmtlib/fmt. Is there anything not covered in this lib? strprintf.hpp
is a bit underdocumented and I'm struggling to find out what uses it has. f"{}"
like functionality, but maybe more.
Historically, we haven't used fmt
because it was an external dependency which we thought we didn't need, and besides, we'll essentially get it for free with C++20. That said, as far as external dependencies go, I understand that fmt
is exceptionally well behaved.
Another part of the problem though is that string formatting is something done very rarely in the Arbor core library — and if certain parties get their way (ahem), will become even more rare. It's predominantly used for string representations of objects that are in turn used in tracing or debugging contexts.
The open issue on the topic (#514) highlights the problem of dependencies. On one hand, duplication should be eschewed, but on the other, we don't want unnecessary coupling between Arbor components.
Some of the formatting stuff included in Arbor proper is there because it came with code from other projects that was glommed in early on. We almost certainly wouldn't do things that way if we were to add that code today.
In short, solving this cleanly is tricky, and also has been low priority, and consequently has not been properly addressed.
For Python, I think the best bet is to ignore what the C++ lib does or doesn't do for now, because there is no real consistency in the C++ side, and it's not yet clear how to best achieve it. And then a good set of Python representations might inform what the C++ lib should do in the future.
I am in favor of adding fmt
as a dependency. It is a very solid library, small footprint, and certainly better than our roll-your-own solution.
Re fmt
though, let's move that discussion though to a different issue (such as #514). I can see that fmt
is good, but I'm less convinced that we need it.
@halfflat Keeping C++ printers separate from Python printers would go against the plan to wrap the C++ printer for Python wherever possible, right? The idea that the PR addressing this issue is going to remove the inconsistencies.
Also, since the PR will touch every class, addressing #514 by introducing libfmt in one go would make sense to me?
We might not even have a standard C++ printed representation for these objects, and if we do, their purpose could be different to that of the Python __str__
. It is not plain to me that the Python output should wrap what the C++ emits to e.g. a std::ostream
. And I don't even know how this might or should be affected by locale or encoding.
I don't want the Python __str__
strings to drive a bunch of possibly invasive changes in the C++ lib when they could be their own thing. We can always revisit this in the future: it would be more net work, but in the meantime, the Python lib would be producing useful output.
@fmt
... I am using it basically for every project I start (along with spdlog
). It does not quite reach f"{}"
levels, but it's pretty amazing (eg no f"{x + 42:>10.2f}"
). Especially modcc would be profitting.
@halfflat Then there has been a misunderstanding: I thought we had agreed to precisely this in the meeting. A quick example:
https://github.com/arbor-sim/arbor/pull/1434
What are the arguments for having these separate?
You might note the question mark in those minutes — it was not agreed to!
And the reasons are exactly those mentioned here, and raised in the meeting: we don't have consistent representations C++ side, and different conversions are used for different purposes. Let's please sort that out first, before cojoining the C++ and Python formatting.
For Python there is also the (IMO very reasonable) option of using the Python print()
as exposed by pybind11.
The sentence with question mark was a summary, and I do not recall opposition. But let's move on to actual blockers.
Not all of the API is consistent between languages, certainly. But a lot is. The shims on the schedulers for instance don't change the functionality, so also not the feedback a user might want.
What exactly should be sorted out you think?
I'm claiming that text representations on the C++ side is, essentially, a mess, and shouldn't form a basis for the Python representations without some sort of reform at the least. Also: the C++ forms and Python forms do not align 1-to-1, regardless. Using fmt
or not for the C++ side is a separate issue.
I propose that in the interests of getting a better Python interface, we just write the __str__
content we want to see, and also propose that we look at using the Python print()
, as exposed by pybind11, to help implement that and avoid the duplication of strprintf
code.
To elaborate on the C++ side, there is a mix of representations: some are used just for debugging; some are used to provide a summary of an object state; some are roundtrippable. These representations are mostly (all?) given by overloads of std::ostream::operartor<<
, but for types that are aliases for containers of other objects, e.g. mcable_list
we are providing an overload for a type (std::vector
) that is not even our own.
To sort this out properly would require, in my view, that these different purposes are teased apart, and that we don't implement them all as operator<<
overloads. We could consider, for example, that we include debug/trace representations as something that is only enabled when we configure the library so. But we should also question whether we want all these string and iostream representations floating about in the core library at all.
In my view and experience, a piece of scientific software for simulating a model is essentially always used for one offs: until the final run every run is a test. As such, I expect I may need to query any object at any time, so I like to not have to switch to a debug build for these basics. Which does not mean a debug/trace build could provide much more feedback if the user really needs to dig in, but very often that is not required. Basics are verifying that parameters are set, and, as @Helveg points out, an indication the object is properly initialized and where, to help verify that I have not accidentally mis-initialized/assigned a (number of) objects. Also in C++ land, this kind of feedback is helpful.
A verbose flag (see other issue) would provide runtime feedback such as how many spikes this poisson generator actually generates, but that's left out of scope here.
So, in principle, I consider being able to print or stream an object for some basic feedback core functionality, and as such permissible in the core library. How we actually implement that (stream operator, a custom print function, a print/format/parse method on all classes) I don't have a strong preference about, but the purpose is, in my mind, clear: simple to use methods for scientific users to perform quick checks that provide feedback while building larger simulations.
The libfmt
discussion is related tangentially: you're right that going over all objects is a bit of work, which is why I'd like to agree on using something that I'm sure is what we'll eventually be using anyway to avoid have to redo it later.
I see enthusiasm for the idea, and I want to start on it. The main counter by @halfflat is that operator<<
may be used for other purposes entirely than providing a summary. If I summarized that correctly, I propose that, like how Python separates these concerns by having both __repr__
and __str__
, we come up with a similar convention for our C++ API.
Proposal:
__repr__
: operator<<
__str__
: str()
I will leave any operator<<
as is, and use libfmt
when writing any str()
for objects in include/
, which will be used by defining __str__
wherever possible.
I am a bit late to the party, but here my 5c.
<<
is probably not a good idea for the reasons cited above.For me, that implies coming up with an API that separates the different purposes and applying it consistently over all of Arbor. I'll give an example here
template<typename T> sd::string print_debug(const T&) { ... } // Recursively print members
template<typename T> sd::string print_trace(const T&) { ... } // Print object id (name and 0xthis?)
template<typename T> sd::string print_sexp(const T&) { ... } // Recursively print sexp members
template<typename T> sd::string print_json(const T&) { ... } // ...
I looked into that, but I had to conclude you can only print out e.g. structs that way with the aid of a reflection library, which seems a too heavy a method. Maybe my research was insufficient? Of course, I can write template specializations, but then diverging the printer from the object might become too likely.
Yes, reflection would be nice and handy, but we will have to wait until (at least) '23 with that.
The danger of divergence is there, correct, but that would be true with operator<<
, too.
We can keep object and printing close by and overload on an ad-hoc basis, if needed, copy from
std::to_string
for an example.
So, based on this issue and a convo with Brent, have made #1451. With that covering the C++ side, I propose that we repurpose this issue to focus on the Python requirements, which are much less contentious, and which can inform C++ representations as or if we provide them.
In order to verify models, it's helpful to query object for a bit of feedback on their state/initialization. For that, we have printers in place, but they would benefit from some revision: some don't print anything but their name, C++ and Python feedback on the same object don't match. Therefore, I'd like to go over all of them:
__str__
implementations and insure they print something informative about their class.ostream& operator<<(ostream& os, const Type& type)
for all Arbor classes and structs.Implementing
__repr__
should be done once we're able to fully represent objects (arblang?) and specifically test that it works.Update: While
__repr__
is supposed to beeval
able, it does not have to recreate the object. According to the docs, the current<arbor.blah>
are OK, if a bit useless.