LauraWartschinski / VulnerabilityDetection

vulnerability detection in python source code with LSTM networks
114 stars 46 forks source link

About labeling #5

Closed lsqLoveCoding closed 4 years ago

lsqLoveCoding commented 4 years ago

Hey, it is mentioned in your master paper that "A small focus window traverses through the whole source code in steps of length n" at part "5.2 Processing the data". I wonder what 'steps of length n' means. Does n and m refer to the number of tokens, or the number of lines of code, or other indicators?

LauraWartschinski commented 4 years ago

Short answer: roughly the length in bytes.

Long answer: You can take a look at figure 11 on the next page, or you can look at the "getBlocks()" function in There you can see that the small focus window is marked in blue and has the length n.

What is happening there is that the focus is currently, a certain location in the sourcecode, and the next location for the focus is n bytes/characters further behind. Because the focus should always be at the start or end of a token, and not directly in the middle of a token (e.g. at the l in while), it is not exactly n bytes later, but the function nextsplit() is used to determine whether there is a character like "\n" or ":" or "=" etc that indicates the end of a token.

For a given focus, the method getcontextPos() determines the context of roughly length m (in characters) around the focus point, that is, roughly n bytes with the focus point in the middle. However, this is also done in such a way that the context starts and ends at the border of tokens, which is why it's not exactly m bytes.

lsqLoveCoding commented 4 years ago

I'm sorry to see your reply so late because I have been busy looking for a job during this time. Thank you very much for your so detailed explanation, now I fully understand it :) ...... By the way, 2019-nCoV virus is very serious now, be careful pls' !

LauraWartschinski commented 4 years ago

Great =) No worries, I'm taking it seriously. You also take care!