Open Gadiguibou opened 1 year ago
I'm not sure if we handle this already, but we probably want to make sure the relative lexer handles the different aliases for registers (R0 vs A1, R4 vs V1, R14 vs LR, etc.). It may be necessary to start with a simple search and replace (e.g., "A1" --> "R0", "LR" --> "R14") so that students can't fool the analyzer by switching between R registers and A/V registers.
Do you mean the naive lexer? The relative one just treats all of those as "symbols" anyways.
No, I mean the relative lexer. The naive one already identifies registers by their number and therefore considers A1
and R0
to be the same, right?
Consider the following situation:
student1.s:
mov r0, #1
add r1, r0, #2
student2.s:
mov r0, #1
add r1, a1, #2
In the first case, the second occurrence of r0
will have a positive offset. In the second case, a1
does not occur earlier and so it will have an offset of 0. By mixing A
and R
registers, student 2 was able to copy student 1's code without being detected.
While we're at it, we should also test that mixing capitalization (e.g., sometimes a1
, sometimes A1
) doesn't similarly fool the relative lexer.
Capitalization is already handled but I don't think this is a priority given it requires a special rule for all register aliases and won't work on fpu registers, different architectures with more or fewer registers or different register names like armv8's etc or cortex-a's.
Symbol case-insensitivity(included in lexing in #12)