Closed cassianomonteiro closed 6 years ago
Could you help me understand the use case for this a bit more? The test case given is a bit odd since it appears to be tokenizing a fragment of a block comment (which would normally be treated as a single "token"). There are also a few other cases in the tokenizer which would need updates to support ignoring errors (including strings and some number literals).
I'm using this to tokenize snippets of code and patches... So usually I get fragments which are not exactly complete pieces of code, but rather one-line changes. That's why this test case is just a fragment of a comment, and not the complete thing.
I had a use case for this a few weeks ago but the details have left me... +1 for this as I can definitely see the benefits of parsing incomplete snippets. @cassianomonteiro sounds like you're automating your code review with this?
@Deathnerd I'm researching vulnerability detection in Android libraries. In my current project, I'm trying to detect known vulnerabilities in other versions of a library using machine learning. To calculate features for my prediction model, I'm using java tokens from fix patches. That's why I'm trying to parse snippets of code.
@cassianomonteiro That sounds seriously awesome. I wish you luck in your endeavors!
@c2nes Thanks for reviewing this! I'm a little busy these days, but I will try to work on it as soon as possible.
Looks good to me. Thanks agian @cassianomonteiro!
Option to ignore errors on tokenization. Useful when parsing snippets of code instead of complete block/files.