P1sec / pycrate

A Python library to ease the development of encoders and decoders for various protocols and file formats; contains ASN.1 and CSN.1 compilers.
GNU Lesser General Public License v2.1
381 stars 132 forks source link

Strange IMPORT parsing behaviour #141

Closed benmaddison closed 3 years ago

benmaddison commented 3 years ago

I have several ASN.1 modules that import various things from other modules.

In many (although, strangely, not all) cases, the parser seems to skip the first imported name.

For example this works fine (notice the import of Dummy):

RPKIManifest { iso(1) member-body(2) us(840) rsadsi(113549)
   pkcs(1) pkcs9(9) smime(16) mod(0) 60 }

DEFINITIONS EXPLICIT TAGS ::=

BEGIN

-- EXPORTS ALL --

IMPORTS
    Dummy, CONTENT-TYPE, ContentSet
    FROM CryptographicMessageSyntax-2009
        { iso(1) member-body(2) us(840) rsadsi(113549)
          pkcs(1) pkcs-9(9) smime(16) modules(0) id-mod-cms-2004-02(41) } ;

-- Manifest Content Type

ct-rpkiManifest CONTENT-TYPE ::=
    { TYPE Manifest IDENTIFIED BY id-ct-rpkiManifest }

-- Manifest Content Type: OID

id-smime OBJECT IDENTIFIER ::= { iso(1) member-body(2)
us(840) rsadsi(113549) pkcs(1) pkcs9(9) 16 }

id-ct OBJECT IDENTIFIER ::= { id-smime 1 }

id-ct-rpkiManifest OBJECT IDENTIFIER ::= { id-ct 26 }

-- Manifest Content Type: eContent

Manifest ::= SEQUENCE {
version        [0] INTEGER DEFAULT 0,
manifestNumber     INTEGER (0..MAX),
thisUpdate         GeneralizedTime,
nextUpdate         GeneralizedTime,
fileHashAlg        OBJECT IDENTIFIER,
fileList           SEQUENCE SIZE (0..MAX) OF FileAndHash
}

FileAndHash ::= SEQUENCE {
file  IA5String,
hash  BIT STRING
}

END

If I remove the Dummy import, then the compiler complains that it can't find CONTENT-TYPE. I've tried tracking this down in the source, with no luck. Before I start stepping through it in pdb, I'm hoping you can give me a pointer?!

benmaddison commented 3 years ago

@p1-bmu this seems to have magically fixed itself!

Please don't waste time on this yet - I am trying to figure out what changed...

benmaddison commented 3 years ago

OK, I've managed to recreate this after a bit of digging.

The failure occurs if all the following are true:

  1. The module imports symbols from N (N >= 2) other modules;
  2. Any module, other than the last module (M for M < N) has a non-empty AssignedIdentifier (i.e. is referenced by OID);
  3. The AssignedIdentifier of module M is separated from its name by a new-line.

When this occurs, the tokeniser appears to treat the OID of module M as part of the first symbol imported from module M+1.

The following is enough to recreate the issue:

-- ////////////////////////////////////////////////////////////// --

FooModule { iso(1) identified-organization(3) dod(6) internet(1)
    private(4) enterprise(1) workonline(35743)
    test(999) mod(0) foo(1) }

DEFINITIONS IMPLICIT TAGS ::=
BEGIN

id-foo OBJECT IDENTIFIER ::= { iso(1) identified-organization(3) dod(6)
    internet(1) private(4) enterprise(1) workonline(35743)
    test(999) id(1) foo(1) }

END

-- ////////////////////////////////////////////////////////////// --

BarModule { iso(1) identified-organization(3) dod(6) internet(1)
    private(4) enterprise(1) workonline(35743)
    test(999) mod(0) bar(2) }

DEFINITIONS IMPLICIT TAGS ::=
BEGIN

id-bar OBJECT IDENTIFIER ::= { iso(1) identified-organization(3) dod(6)
    internet(1) private(4) enterprise(1) workonline(35743)
    test(999) id(1) bar(2) }

END

-- ////////////////////////////////////////////////////////////// --

BazModule { iso(1) identified-organization(3) dod(6) internet(1)
    private(4) enterprise(1) workonline(35743)
    test(999) mod(0) baz(3) }

DEFINITIONS IMPLICIT TAGS ::=
BEGIN

IMPORTS

    id-foo FROM FooModule
        { iso(1) identified-organization(3) dod(6) internet(1)
          private(4) enterprise(1) workonline(35743)
          test(999) mod(0) foo(1) }

    id-bar FROM BarModule
        { iso(1) identified-organization(3) dod(6) internet(1)
          private(4) enterprise(1) workonline(35743)
          test(999) mod(0) bar(2) };

id-foo-baz OBJECT IDENTIFIER ::= { id-foo 1 }

id-bar-baz OBJECT IDENTIFIER ::= { id-bar 1 }

END

-- ////////////////////////////////////////////////////////////// --

If the IMPORTS section of BazModule is updated so that the AssignedIdentifier parts begin on the same line as the exporting module, the issue goes away:

-- ...
IMPORTS

    id-foo FROM FooModule {
          iso(1) identified-organization(3) dod(6) internet(1)
          private(4) enterprise(1) workonline(35743)
          test(999) mod(0) foo(1) }

    id-bar FROM BarModule {
          iso(1) identified-organization(3) dod(6) internet(1)
          private(4) enterprise(1) workonline(35743)
          test(999) mod(0) bar(2) };
-- ...

I have checked X.680, and both syntaxes are legal (see section 11.1.6).

I will try to find the offending code, but any pointers would be welcome!

p1-bmu commented 3 years ago

The code for parsing IMPORTS is here: https://github.com/P1sec/pycrate/blob/cc319e47a91158e1009053f0e7fc22aa7fd060b6/pycrate_asn1c/asnproc.py#L653

If I recollect correctly, the compiler only accepts an OID when it starts on the same line as the module name, in a FROM declaration. This is because some specifications do not use a full OID notation, but only references to OID values, in such a way:

IMPORTS
  first_obj,
  second_obj
    FROM FirstModule FirstModuleOID
  third_obj,
  fourth_obj
    FROM SecondModule SecondModuleOID

Where FirstModuleOID and SecondModuleOID are defined somewhere as OID values. The issue exists when the OID reference is on the next line, where it cannot be distinguished from an imported symbol from the next module.

This is however a long time I did not touch this part of the code... I may be wrong.

benmaddison commented 3 years ago

That explanation makes sense, thanks.

It may be possible to distinguish between a symbol and an OID ref by looking at whether it is followed by ,|FROM. That may be a difficult regex to write!

An alternative might be to limit an OID ref to the same line, but allow a literal OID on a new line?

Do you have a preference?

Either way, the fix will be in this guy, right?:

https://github.com/P1sec/pycrate/blob/013b3fccbb3025ea73a900c10e3c946006b4901d/pycrate_asn1c/utils.py#L336-L339