Cyfrin / aderyn-vscode

MIT License
1 stars 1 forks source link

Aderyn VSCode needs Char Offset, not Byte Offset #25

Closed TilakMaddy closed 1 month ago

TilakMaddy commented 2 months ago

image

Although the line numbers are correctly identified in the tree view, the underlined lines are different

alexroan commented 2 months ago

This is to do with ASCII characters at the beginning of the contracts.

The problems is with cloc. It appears in the ETH deposit contract also. Can you pleas open an issue on the core repo @TilakMaddy ?

TilakMaddy commented 2 months ago

Hmm.. Not sure I understood you correctly.

  1. Aren't we skipping cloc in vscode extension (--skip-cloc)
  2. If not, how are the nsloc helping in highlighting line numbers of issues?
  3. I remember adding this line https://github.com/Cyfrin/aderyn/blob/96b423789ef36ae970d6ba975165a5daa2fcef5c/tests/contract-playground/src/cloc/AnotherHeavilyCommentedContract.sol#L39 with the sole purpose of making sure it doesn't break on non ascii characters.
TilakMaddy commented 2 months ago

@alexroan I want to make this 1 thing clear - In the tree view, it identifies the line correctly, but when I click on it, it highlights a different line. So I am unable to understand how this is a problem with the CLI tool. It must be with the VSCode right??

And yes, this inconsistency problem (between tree view line number and actual highlighted line number) goes away when you remove the non-ascii stuff. (a.k.a THUNDER LOAN written fancily at the top)

TilakMaddy commented 2 months ago

Let me see if I can fix it here : ) I'm sure somewhere in the javascript we are not treating source location offset as number of offset bytes, but offset number of characters instead . (Which in UTF-8 encoding, only for an ascii character, 1 byte is enough to represent 1 character)

Solution: I'm thinking something along these lines for getting the actual start offset


function highlightIssueInstance(issue: any, instance: any, severityString: string, diagnosticSeverity: vscode.DiagnosticSeverity, diagnosticCollection: vscode.DiagnosticCollection) {
    const issueUri = vscode.Uri.file(path.join(vscode.workspace.workspaceFolders![0].uri.fsPath, instance.contract_path));
    const srcParts = instance.src.split(':');
@>    const startByteOffset = parseInt(srcParts[0], 10);
    const length = parseInt(srcParts[1], 10);

    vscode.workspace.openTextDocument(issueUri).then(document => {

       // Calculate character offset from byte offset
        let startCharacterOffset = 0;
        let totalBytesConsumed = 0;
        console.log(instance.contract_path);
        let contents = fs.readFileSync(instance.contract_path, { encoding: "utf-8" });
        for(const c of contents) {
            const byteSize = new TextEncoder().encode(c).length;
            totalBytesConsumed += byteSize;
            startCharacterOffset += 1;
            if (totalBytesConsumed === startByteOffset) {
                break;
            }
        }

        // Supply character offset instead of byte offset
        const startPosition = document.positionAt(startCharacterOffset);
        const endPosition = document.positionAt(startByteOffset + length);
        const range = new vscode.Range(startPosition, endPosition);
        const hoverOverText = `${severityString}: ${issue.title}\n${issue.description}`
        const diagnostic = new vscode.Diagnostic(range, hoverOverText, diagnosticSeverity);
        diagnostic.source = 'Aderyn';

        const existingDiagnostics = diagnosticCollection.get(issueUri) || [];
        const newDiagnostics = [...existingDiagnostics, diagnostic];
        diagnosticCollection.set(issueUri, newDiagnostics);
    });
}

If that doesn't work, we may have to take care of this problem in the aderyn cli binary (probably have to use https://docs.rs/unicode-segmentation/latest/unicode_segmentation/trait.UnicodeSegmentation.html#tymethod.grapheme_indices) to find out the desired character offset (although I believe the vscode docs really mean graphemes :P ) from byte offset

TilakMaddy commented 2 months ago

Look the docs says "character position", not byte offset image

TilakMaddy commented 2 months ago

@alexroan With the PR https://github.com/Cyfrin/aderyn/pull/566 we have a new property src_char which gives offsets and length in terms of characters rather than bytes. This should fix the problem : )

alexroan commented 1 month ago

Fixed by #39