Open Dr15Jones opened 6 years ago
A new Issue was created by @Dr15Jones Chris Jones.
@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign core
New categories assigned: core
@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks
Doesn't python have some newfangled type hinting in the 3 series? Is this something we could utilize?
@bbockelm We are using python 2.7 in CMSSW. We will (probably) attempt a move to python 3 during the long shutdown.
@Dr15Jones
Perhaps it is not very objective reason, but these underscore shorthands look incredibly ugly. I strongly prefer the first code block than 'rewritten' one.
from FWCore.ParameterSet.Shorthand import *
is no more prettier than from FWCore.ParameterSet.Config import *
Except with the latter, at least the types are still readable... int32(something)
, untracked.double
.
If anything, the second line import this
tells us that Explicit is better than implicit.
Operator overloading is, in no way, 'explicit'. It was done before this proposal - *
, +
- and I have found it somewhat annoying (and non trivial).
For the last part, I don't really see how d_*[1,2]
is better than vdouble(1,2)
.
Overall, this proposal feels like you are reimplementing perl in python. "Explicitness" is what makes python great, I suggest we keep it that way.
I am wondering who is the target here. Experience from CRAB land suggests that many users would be fully lost. OTOH there's ample evidence that most users work by using precooked configs and at most changing one or two values "under dictation" making no attempt to understand what they are doing. So if this is expert stuff.. anything will do, But if general usage is desired, more care for the end user is needed. Looking from afar as someone who had never used this, there's something disturbing in the fact that types need to be specified. Python should work w/o it and if the internals could cast to proper C types if/when needed in order to i/f to EDM and throw if input does not look sensible. No ?
Still from afar it is good that code can be read, all suggested shorthands look set to make things hard there, if aything, process.o is oddly cryptic already! But indeed there is an excessive verbosity :-( Ideally I'd like to type something like this :
process = 'Test'
PoolSource (fileNames = ["foo","bar"])
maxEvents = 10
numTrheads = 4
work = "IntProducer"
PoolOutput (fileName="foobar")
I second Stefano's suggestion. I ended up checking this out since I agree that the config files are too verbose and hoped to see an improvement. But ~s_ seems incredibly dangerous since it's easy to drop either line, and it's basically yet another config language the poor user has to learn.
I think I can authoritatively speak to the lack of user expertise in this area, since I barely know enough Python to make changes in config files, and I guess correctly only 50% of the time. So for a newbie like myself, verbosity is actually good, since it helps me understand what is what.
I get that this is tedious to experts... but how often do the experts need to write lots of config files? If that ever happens, can't we have scripts or GUI writing config files? (Didn't we have ConfigEditor at some point?) If we are spending time improving the interface, in this era of AI can't we go to natural language processing directly? Or at least have something like Stefano's example above -- not much typing, and perfectly clear to the user...
I agree that the only sensible change would be to obsolete cms specific types and use plain python literals
Looking from afar as someone who had never used this, there's something disturbing in the fact that types need to be specified. Python should work w/o it and if the internals could cast to proper C types if/when needed in order to i/f to EDM and throw if input does not look sensible.
While I do agree on the verbosity of our configuration language, and that in principle the explicit typing is unnecessary (boilerplate), I want to point out that the current system allows python-level typo checking, i.e.
prod = cms.EDProducer("FooProducer", foo = cms.int32(3))
prod.foo = 4 # works
prod.fop = 5 # throws exception when running through python
prod.fop = cms.int32(5) # works
I find this feature very handy e.g. when migrating/customizing configuration files for some code change. Of course there are other ways of achieving the same checks e.g. in case we drop the explicit types by default.
I wanted to thank everyone for their comments. I'm afraid fully redesigning the configuration system (and the underlying provenance capture implementation that drives much of the design) is beyond the scope of this particular RFC.
@Dr15Jones - dumb question, but is there an urgency here?
That is, if the move to Python3 is delayed until the long shutdown -- and Python 3 offers some native language-level syntax help for type inference -- why not simply wait to do this for the long shutdown?
I mean, I understand the urge to do something here: but it seems at this point we're really only waiting 12 months or so. No?
@bbockelm in my limited search, the only type control I could find which was added to Python 3 is https://docs.python.org/3/library/typing.html#classes-functions-and-decorators
That appears to only restrict which types can be used with with functions or member data which is not the same problem as this RFC was trying to address (which was shortening the declaration of a type).
Regardless, given the negative feedback on this RFC no changes will be implemented.
Hi @Dr15Jones, sorry for joining the party late - I think I agree with many of the comments here.
One good alternative I see would be
import FWCore.ParameterSet.Config as cms
process = cms.Process("Test")
process.source = cms.Source("PoolSource",
fileNames = [ "foo.root", "bar.root" ]
)
process.maxEvents = dict(
input = 10
)
process.options = dict(
numberOfThreads = 4,
numberOfStreams = 0
)
process.work = cms.EDProducer("IntProducer",
value = 1
)
process.out = cms.OutputModule("PoolOutputModule",
fileName = "test.root"
)
process.o = cms.EndPath( process.out, cms.Task(process.work))
, delaying all type and "trackiness" (tracked-ness ?) checking to the C++ part, for example to the ParameterSet validation implemented via fillDescriptions()
.
It should also be possible to support the old syntax by making the cms.type
types aliases for the underlying python types.
(I do not know how the python-to-C++ translation works in CMSSW, whether everything is a string before getParameter()
, or if it needs to assign the values to C++ variables; if the latter, we could use some variant-like type)
Here's an even more ambitious syntax (which I don't know if we could actually implement):
process = cms.Process("Test")
process.source = PoolSource(
fileNames = [ "foo.root", "bar.root" ]
)
process.maxEvents = dict(
input = 10
)
process.options = dict(
numberOfThreads = 4,
numberOfStreams = 0
)
process.work = IntProducer(
value = 1
)
process.out = PoolOutputModule(
fileName = "test.root"
)
process.o = cms.EndPath( process.out, cms.Task(process.work))
, deducing the python type from the C++ object.
The "known" PSets like process.options
and process.maxEvents
could also be simplified to
process.maxEvents(
input = 10
)
process.options(
numberOfThreads = 4,
numberOfStreams = 0
)
but I'm not sure we could do it for arbitrary PSets.
Anyway, just some ideas, probably for LS2...
OK, it's actually doable also for arbitrary PSets, with something like:
class PSet:
def __call__(self, **kwargs):
self.__dict__.update(kwargs)
class Process:
def __init__(self, name):
setattr(self, '@name', name)
def __getattr__(self, key):
setattr(self, key, PSet())
return getattr(self, key)
Recently I found myself having to hand write a number of configuration files for testing. The process was very tedious. Even a simple example requires lots of typing
One issue is having to type
cms.untracked
in many different places.cms.untracked
acts like a unary operator. python has overloads for unary operators, the best candidate appears to be the bitwise not operator~
. With a trival modification to_ParameterTypeBase
to define a__invert__
method one could use~
in addition tocms.untracked
. This would make the example becomeWhich is less to write, but would take getting used to when trying to read.
Further changes could be done by allowing shorthands for the standard
cms.
types. I could see two different ways of handling such shorthands (which would not replace the full names, but could be used as alternatives).Simple variable substitution
One could create the following file in FWCore/ParameterSet/python/Shorthand.py
Using these simple assignments would allow the following version of a configuration
Type as Units
Another way would be to use a helper class which works similar to assigning a unit (e.g. cm) to a value to create the appropriate type. This could be done using the multiplication operator. Python already has a history of using the multiplication operator in novel ways such as
l = [0]*3
which is equivalent tol=[0,0,0]
. For this case, one could writev = i_*1
which is equivalent tov=cms.uint32(1)
orv=i_*[1,2]
which is equivalent tov=cms.vint32(1,2)
. Note that in this version, we do not need a new variable for each container type, instead we can reuse the same symbol and just have it map to all items in a standard python container. Using such a helper function the example could becomeAny of these changes are trivial to do. The question is does the reduction in the verbosity make up for code that may be more difficult (at least initially) to read?