P1sec / pycrate

A Python library to ease the development of encoders and decoders for various protocols and file formats; contains ASN.1 and CSN.1 compilers.
GNU Lesser General Public License v2.1
381 stars 132 forks source link

update import FROM regex #142

Closed benmaddison closed 3 years ago

benmaddison commented 3 years ago

I think this fixes #141

Haven't had time to test as much as I would like.

p1-bmu commented 3 years ago

Could you try recompile all ASN.1 modules after your modification, and check if everything is still OK ?

$ python2 -m pycrate_asn1c.asnproc
benmaddison commented 3 years ago

It's not happy - traceback below. Any suggestions?

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/benm/documents/repos/pycrate/pycrate_asn1c/asnproc.py", line 1272, in <module>
    generate_all()
  File "/home/benm/documents/repos/pycrate/pycrate_asn1c/asnproc.py", line 1254, in generate_all
    compile_spec(**kwargs)
  File "/home/benm/documents/repos/pycrate/pycrate_asn1c/asnproc.py", line 88, in compile_spec
    compile_text(spec_texts, **kwargs)
  File "/home/benm/documents/repos/pycrate/pycrate_asn1c/asnproc.py", line 257, in compile_text
    compile_modules(remain)
  File "/home/benm/documents/repos/pycrate/pycrate_asn1c/asnproc.py", line 940, in compile_modules
    ObjNew = asnobj_compile(Obj)
  File "/home/benm/documents/repos/pycrate/pycrate_asn1c/asnproc.py", line 885, in asnobj_compile
    rest = Obj.parse_value(Obj._text_def)
  File "pycrate_asn1c/asnobj.py", line 4566, in parse_value
    return getattr(self, self._PARSE_VALUE_DISPATCH[self._type])(text)
  File "pycrate_asn1c/asnobj.py", line 5570, in _parse_value_class
    self._parse_value_class_syntax(vals, values)
  File "pycrate_asn1c/asnobj.py", line 5695, in _parse_value_class_syntax
    self.__parse_value_class_syntax_grp(vals)
  File "pycrate_asn1c/asnobj.py", line 5744, in __parse_value_class_syntax_grp
    self.__parse_value_class_syntax_grp(vals)
  File "pycrate_asn1c/asnobj.py", line 5717, in __parse_value_class_syntax_grp
    self._text = self.__parse_value_class_field(Field, self._text, vals)
  File "pycrate_asn1c/asnobj.py", line 5641, in __parse_value_class_field
    text = ObjProxy.parse_value(text)
  File "pycrate_asn1c/asnobj.py", line 4566, in parse_value
    return getattr(self, self._PARSE_VALUE_DISPATCH[self._type])(text)
  File "pycrate_asn1c/asnobj.py", line 5234, in _parse_value_choice
    return self._parse_value_ref(text)
  File "pycrate_asn1c/asnobj.py", line 4654, in _parse_value_ref
    .format(self.fullname(), ident)))
pycrate_asn1c.err.ASN1ProcTextErr: errorCode: value errcode-canceled, undefined
p1-bmu commented 3 years ago

Not any special suggestion, just to identify which ASN.1 source file triggers the issue, and then to fix the code and / or the regex... Welcome to the great world of ASN.1, and all its diversity ;)

On the other side, in case you have just a few source files that have such IMPORTS statements, maybe it would be easier to simply adapt them so they can compile. The compiler in pycrate has other limitations that I never solved, too (see https://github.com/p1sec/pycrate/wiki/Compiling-asn1-specifications#limitations)...

benmaddison commented 3 years ago

Not any special suggestion, just to identify which ASN.1 source file triggers the issue, and then to fix the code and / or the regex... Welcome to the great world of ASN.1, and all its diversity ;)

Ack. Stepping through it in pdb now.

So far, the results suggest that the change to the regex is correct, and that it has actually fixed a different bug[*], which has exposed a latent issue further down the pipeline.

I will have further results in the next hour or so.

[*]: the current regex matches a ; to detect an OID-Ref in the final import section. The trailing ; is never included in the text argument to module_get_import(...), so that case would never match. My PR changes the regex to match on $ in this case.

On the other side, in case you have just a few source files that have such IMPORTS statements, maybe it would be easier to simply adapt them so they can compile.

Yeah, if they were all mine then I would. The primary use case for rpkimancer is prototype generation during the standardisation phase of new RPKI object types. The usual convention in the IETF is:

foo FROM Bar
    { <id-mod-bar> }

Convincing people to change this will be harder than fixing the parsing!

The compiler in pycrate has other limitations that I never solved, too (see https://github.com/p1sec/pycrate/wiki/Compiling-asn1-specifications#limitations)...

Yeah, I saw those. The only limitation that I have in the mods that I am working with is processing multi-root WITH COMPONENTS. I think that implementing anything on that list would require a full tokeniser->lexer->ast pipeline. That's not something I have capacity to offer help with at the moment!

benmaddison commented 3 years ago

OK, that was a hassle, but it now covers all the corner cases in asn1dir

One question: Do you know if it is legal to import a symbol with non-empty params, like:

IMPORT
  foo{bar} FROM Baz;

There are no examples of this in asn1dir, and I'm not even sure what would mean, but if it is legal then the regex will break!

Also, should I add python -m pycrate_asn1c.asnproc to the CI workflow, to prevent regressions?

benmaddison commented 3 years ago

Oh, sorry, one more thing:

There is lots of scope to make the regex more compact and readable by making full use of qualifier and set shorthands. E.g. + instead of {1,}, and \w instead of [a-zA-Z0-9_] I have avoided using them in this PR to maintain consistency, but I can shorten the new version substantially.

Let me know what you prefer?

p1-bmu commented 3 years ago

OK, that was a hassle, but it now covers all the corner cases in asn1dir

One question: Do you know if it is legal to import a symbol with non-empty params, like:

IMPORT
  foo{bar} FROM Baz;

There are no examples of this in asn1dir, and I'm not even sure what would mean, but if it is legal then the regex will break!

I don't think this is permitted by the ASN.1 specification, and should not happen.

Also, should I add python -m pycrate_asn1c.asnproc to the CI workflow, to prevent regressions?

No, this is useless. I am taking care of this when a change is made in the compiler (what is very rare).

p1-bmu commented 3 years ago

Oh, sorry, one more thing:

There is lots of scope to make the regex more compact and readable by making full use of qualifier and set shorthands. E.g. + instead of {1,}, and \w instead of [a-zA-Z0-9_] I have avoided using them in this PR to maintain consistency, but I can shorten the new version substantially.

Let me know what you prefer?

I was not used to Python re library when I wrote this part of the code, therefore I used std regex patterns. If you are sure + and \w are equal replacement for more std notation, go ahead. Be careful also with the unicode / ascii mess between Python 2 and 3 (at least for \w).

p1-bmu commented 3 years ago

As such, your change does not impact the code generation for integrated ASN.1 modules, therefore I am happy to merge it if this is fine for you.

benmaddison commented 3 years ago

As such, your change does not impact the code generation for integrated ASN.1 modules, therefore I am happy to merge it if this is fine for you.

Yeah, please go ahead. I might attempt the regex cleanup when I have a bit more time. That can be a new PR.

benmaddison commented 3 years ago

@p1-bmu thanks for merging. Is there anything further that needs to be done before doing a 0.5.0 release? Happy to help where I can.