Closed sphuber closed 5 years ago
My suggestion:
spec.input('name.spaced.input')
becomes name__spaced__input
Something like (didn't test):
def is_valid_name(name):
if "__" in name:
return False
if name.starstwith("_"):
return False
if name.endswith("_"):
return False
if not(isidentifier(name)): # check e.g. here: https://stackoverflow.com/questions/12700893
return False
return True
Also: should we allow any python identifier (including Unicode like, I believe, in python 3)? Personally I would limit only to [a-zA-Z0-9_]
with the python additional rules (don't start with a digit, don't be a python keyword like global
, from
, as
, ...).
Number 3. is actually more restrictive than needed - one could either a) just disallow names starting with underscore, OR b) disallow names ending with underscore. In this case, the only thing that can happen is to find three underscores in a row, and knowing which rule between a) and b) we have chosen, reconstruct the links and namespaces.
However, I don't think people are using names starting or ending with underscore, and starting with an underscore has a special meaning in python (e.g. node.inputs.<TAB>
would not complete these - so I would for now disallow both of them, and in the future we can also partially lift this requirement.
I believe that with the three rules above, the mapping namespaces->link label
is an injective function (of course it's not surjective, e.g. you cannot get a label like a______b
) but being injective one can always, given a valid link label, invert it to know it if was a variable, a namespace, ...
I like it and would also vote to, for now, disallow port names starting or ending with an underscore.
The only problem I can see for now is how to deal with a data migration. It will be possible to write a migration that does the right thing 100%. Even though WorkChains
are the only processes in 0.12 that had namespacing and so will be the only ones that are affected, many of them will have used underscores in port names. The calculation jobs are not released yet, so as soon as we formalize this, we can send an email to the developers notifying them about the change.
Whether unmigrated data will be a problem depends on what code we will write that depends on a correct interpretation. Some examples I can think of now:
verdi process show
instead of flat labelsinputs
and outputs
support nested inputs, i.e. node.inputs.name.spaced.a
instead of node.inputs.name__spaced__a
node.get_restart_builder
.
In the first case, it will be merely aesthetic so not the worst. The second should still work I think, albeit inconsistent with respect to old data. The third one however, will be broken I think. Of course, since it will have to load the process class, which then gives access to the spec, we could do some heuristics to try and figure out which parts of the label are actually namespaces, but it will dirty and fragile as hell.The new verdi process show
command may instead of:
nbands_factor 1355 Float
relax
final_scf 1357 Bool
max_meta_convergence_iterations 1352 Int
base
code 3 Code
parameters 1347 Dict
kpoints_distance 1354 Float
pseudo_family 1346 Str
options 1345 Dict
max_iterations 1351 Int
relaxation_scheme 1344 Str
meta_convergence 1342 Bool
volume_convergence 1353 Float
bands
code 3 Code
parameters 1347 Dict
kpoints_distance 1349 Float
pseudo_family 1346 Str
options 1345 Dict
max_iterations 1350 Int
scf
code 3 Code
parameters 1347 Dict
kpoints_distance 1356 Float
pseudo_family 1346 Str
options 1345 Dict
max_iterations 1348 Int
clean_workdir 1343 Bool
structure 106 StructureData
produce something like:
nbands_factor 1355 Float
relax
final_scf 1357 Bool
max_meta_convergence_iterations 1352 Int
base
code 3 Code
parameters 1347 Dict
kpoints_distance 1354 Float
pseudo_family 1346 Str
options 1345 Dict
max_iterations 1351 Int
relaxation_scheme 1344 Str
meta_convergence 1342 Bool
volume_convergence 1353 Float
bands
code 3 Code
parameters 1347 Dict
kpoints_distance 1349 Float
pseudo_family 1346 Str
options 1345 Dict
max_iterations 1350 Int
scf
code 3 Code
parameters 1347 Dict
kpoints_distance 1356 Float
pseudo_family 1346 Str
options 1345 Dict
max_iterations 1348 Int
clean_workdir 1343 Bool
structure 106 StructureData
The
Process
andProcessNode
duality forces us to define a mapping between the nested port namespaces of the process and the flat link label hierarchy of the corresponding node in the provenance graph. A process can define a nested port namespace through its spec:If we print the inputs port namespace, we get the following:
When constructing this process, or any process, the inputs are expected to map exactly onto this namespace. So for example,
inputs = {'name': {'spaced': {'input': 1}}}
would be valid. The same principles go for outputs and they should be registered in such a way that it maps one-to-one on the port namespace. For the duration of the process, the inputs and outputs stay organized in this nested hierarchy. However, finally the process needs to be represented in the provenance graph by a node, and the inputs and outputs should be linked up. The link label, which is the only thing that can be used to represent the namespacing, is a string and therefore necessarily one-dimensional. The nested namespace therefore needs to be mapped onto this flat space.Currently, the namespaces are just concatenated with underscores. So the input node in the example would get the link label
name_spaced_input
. However, now it becomes indistinguishable from an input that was passed to a process with the namename_spaced_input
in the top level namespace.We should clearly define the rules for the mapping between nested namespaces and flat link labels. This should also keep in mind that if there is any risk of overlapping labels when expanding namespaces. For example, in the current situation, what would happen for the following input port namespace:
This is currently legal as far as the process spec is concerned. The first is a doubly nested namespace and the latter is a single port in the top level namespace. However, the link labels that will be generated will overlap, making it impossible to link both inputs to the node.
The rules for links should also obey the rules of valid python variable names, as they are used in certain methods for provenance graph traversal, see #2568
Mentioning @giovannipizzi for discussion