MikePopoloski / slang

SystemVerilog compiler and language services
MIT License
550 stars 117 forks source link

Scope of protected envelope support #612

Open udif opened 1 year ago

udif commented 1 year ago

I went over the latest tree and it seems slang now supports the standard `pragma protect set of keywords for protected IP. Unfortunately, I'm still encountering IP blocks using 2 legacy mechanisms :

`protect
...
`endprotect

This dates back to Verilog-XL days, but was also supported by other tools. After encrypting source code tagged as above it is turned into:

`protected
... <base64 blob>...
`endprotected

Another protection mechanism is a Cadence-specific variation on IEEE1735 that uses language specific pragma comments (// pragma for verilog and '-- preagma' for VHDL) :

// pragma protect
...
// pragma endprotect

This one is very similar to the standard IEEE1735 encryption, but it contains a few more keywords (end_data_block, end_key_block, end_digest_block). Given the similarity, this might have been an IP donation from which IEEE1735 evolved, but I am only guessing. I think it can be implemented by slightly extending the current encryption support.

My question is whether would you be willing to support these.

MikePopoloski commented 1 year ago

I could maybe see supporting the old-style Verilog `protect directive, maybe hidden behind a switch or something. The comment-based directives are beyond terrible and really should not exist. Compilers should not be parsing comments for directives, and doing so in slang would not be particularly easy or performant anyway.

udif commented 1 year ago

In my case, `protect and `endprotect are less of an issue because I simply treat them as macros and define them beforehand as: -D protect= -D endprotect= If I had to deal with `protected and `unprotected with encrypted content, this would have been a different story.

Unfortunately, the other blocks contain an already-encrypted // pragma protect section with base64 garbage. If I simply let slang ignore the pragmas as comments, I'm afraid it will run into the base64 data and fail there.

If I cannot incorporate this, my best shot would be to fork tools/driver.cpp and somehow get a hook into Driver::parseAllSources so I could insert my own filtering code after the readSource() loop , but before SyntaxTree::fromBuffer().

Doing that would require already having a hook in driver/Driver.cpp (lets say a function pointer that is called if it is non-null). The advantage of this approach is that I can use the original slang static lib itself without any changes, and just clone my own tools/driver.cpp . This way I wouldn't need to subclass Driver just to replace Driver::parseAllSources, a 150-line function, just to add a function call between readSource and SyntaxTree::fromBuffer()

MikePopoloski commented 1 year ago

I'm open to modifications to Driver to make it easier to extend for you. Could you also solve this with a pre-pass regex-based script that filters everything between the // pragma protect comments?

udif commented 1 year ago

I've given this some thought over the last week, after I decided implementing more trivial changes first (#620).

While my original direction of doing search/replace on SourceBuffer will probably work in my case, in the general case it is not robust:

module top;
wire [17*8-1:0]str = "\
// pragma protect\
";
endmodule

If I blindly search/replace the "// pragma protect" string, I might change unrelated strings. Ideally I would like to be able to install a filter on the token stream, probably Preprocessor::consume(), provided I can do it with zero cost in the regular case.

MikePopoloski commented 1 year ago

Yeah, I think it will be tough to do as that's a very hot piece of code. I think it's probably ok to add one well-predicted branch to the Preprocessor::next() function, which would check whether there's a filter installed and if so redirect the token out to the filter. Then you need some mechanism to not output the result but instead collect tokens until you see an end marker and mark the whole bunch of them as skipped trivia to attach to the following token. You also need a way to call back into the Preprocessor to invoke the pragma decoding logic since the text might otherwise not even be valid tokens.

udif commented 1 year ago

I want to look into adding a check at the end of Lexer::scanLineComment() and optionally call a user function if it is not null. I'm thinking about trying such a definition in Lexer.h:

struct SLANG_EXPORT LexerOptions {
...
    void (*postLineCommentProcessPtr)(Lexer *lexer);
}
...
class SLANG_EXPORT Lexer {
...
void friend postLineCommentProcess(Lexer *source);
...
}

Then in Lexer.cpp

void Lexer::scanLineComment() {
...
    addTrivia(TriviaKind::LineComment);
    if (options.postLineCommentProcessPtr)
        (*options.postLineCommentProcessPtr)(this);
}

And in my main code I should define postLineCommentProcess(Lexer *source) somewhere and then do somewhere postLineCommentProcessPtr = postLineCommentProcess; . The postLineCommentProcess() should check if the current Trivia contains // pragma data_block and if so, consume all Tokens, skip the Base64 section, until encountering // pragma end_data_block etc.

The code could maybe use std::optional or std::function but I need to read about those first.

udif commented 1 year ago

The original reason I wanted to declare postLineCommentProcess as a friend and not a class member was because I thought it had to be outside the class so the user could supply his own function.

However, on a 2nd though there is no reason a member function cannot be defined optionally outside the class. As long as you don't call or reference it directly, the class will compile cleanly.

On a 3rd thought I wasn't sure I can do the function pointer trick on a member function, But after giving it some further reading and saw you can easily define a member function pointer:

void (Lexer::*postLineCommentProcessPtr)();

and use C++17 std::invoke(postLineCommentProcessPtr, this) instead of the function pointer call inside Lexer::scanLineComment().

MikePopoloski commented 1 year ago

I don't think you need a member function pointer here, the custom method isn't going to live inside Lexer itself. You can make use of function_ref which is a lightweight wrapper around a callable defined in Function.h. You'll have to be careful about what you do inside the callback though because some of the Lexer methods can suddenly become reentrant if you start calling back into it while in the middle of an initial lexTrivia call.

udif commented 1 year ago

The reason I wanted a member function is because I need access to the private Lexer methods.

MikePopoloski commented 1 year ago

I think that's kind of abusing the encapsulation though. This is a downstream client method with logic outside the class, so if it wants to do things with the Lexer, that functionality should be made public and made robust for that usage.

udif commented 1 month ago

Raising this issue from the dead... Coming back to this after I began scanning a different, more ancient codebase, with those dreaded // pragma comments...

Would you be accepting a patch that would modify this (and it's header file as well):

void Lexer::scanLineComment() {
...

into:

SLANG_VIRTUAL void Lexer::scanLineComment() {
...

And have SLANG_VIRTUAL default to be empty? This would keep zero overhead for normal case, but would give me the power to override it if I HAVE to abuse the lexer and willing to compile with non defaults. I'm even investigating to go as far as this to make it work while keeping all the dirty work on my side.

MikePopoloski commented 1 month ago

I think your original idea here was better, to pass a comment scanner callback through the LexerOptions. I think you should code up what logic the callback will actually do to implement the functionality you want, and then we can massage the Lexer interface a bit to make that possible without needing vtables or other hackery of the private internals.