CyberShadow / tree-sitter-d

17 stars 6 forks source link

Ensure top community D projects parse successfully #3

Open CyberShadow opened 3 years ago

CyberShadow commented 3 years ago

With #2 closed, we now successfully parse all files in DMD's compilable test suite. However, there is still lots of valid code out there which tree-sitter-d fails to parse successfully.

The following is a list of errors which currently result from attempting to parse D source files from the list of D projects used by the community project tester.

Many of these are simply due to implementation bugs in our parser, but some of these may be due to underspecified grammar. Furthermore, since the problem was not detected testing against the DMD compilable test suite, each occurrence represents an opportunity to improve the coverage of the DMD suite by adding a reduced sample of the unparseable code to it.

For each of the problems below, we should:


ibuclaw commented 3 years ago

IASM GCC Grammar:

GccAsmStatement:
    asm FunctionAttributes(opt) { GccAsmInstructionList }

GccAsmInstructionList:
    GccAsmInstruction ;
    GccAsmInstruction ;  GccAsmInstructionList

GccAsmInstruction:
    GccBasicAsmInstruction
    GccExtAsmInstruction
    GccGotoAsmInstruction

GccBasicAsmInstruction:
    AssignExpression

GccExtAsmInstruction:
    AssignExpression : GccAsmOperands(opt)
    AssignExpression : GccAsmOperands(opt) : GccAsmOperands(opt)
    AssignExpression : GccAsmOperands(opt) : GccAsmOperands(opt) : GccAsmClobbers(opt)

GccGotoAsmInstruction:
    AssignExpression :  : GccAsmOperands(opt) : GccAsmClobbers(opt) : GccAsmGotoLabels(opt)

GccAsmOperands:
    GccSymbolicName(opt) StringLiteral ( AssignExpression )
    GccSymbolicName(opt) StringLiteral ( AssignExpression ), GccAsmOperands

GccSymbolicName:
    [ Identifier ]

GccAsmClobbers:
    StringLiteral
    StringLiteral , GccAsmClobbers

GccAsmGotoLabels:
    Identifier
    Identifier , GccAsmGotoLabels
CyberShadow commented 3 years ago

@ibuclaw Thanks!

So, that raises the question of where this should be kept / maintained:

Any thoughts?

ibuclaw commented 3 years ago

Any thoughts?

Probably yes on all three counts.

dkorpel commented 3 years ago

How is GCC-style assembly actually used in the wild? If I understand correctly, you can't just put it inside a version(GDC) block since code in there still has to parse, so I suppose people either:

Currently iasm.d looks like:

version (MARS)
{
    import dmd.iasmdmd;
}
else version (IN_GCC)
{
    import dmd.iasmgcc;
}

Can't both compilers at least support parsing both, and have GccAsmStatement in their grammar? Or is there ambiguity?

CyberShadow commented 3 years ago

@ibuclaw https://github.com/libmir/mir-cpuid/blob/4add5b639351f15044a8c991ce3a905d7973c3de/source/cpuid/x86_any.d#L606-L609 still doesn't parse with the grammar you posted, is that expected?

ibuclaw commented 3 years ago

How is GCC-style assembly actually used in the wild? If I understand correctly, you can't just put it inside a version(GDC) block since code in there still has to parse, so I suppose people either:

Nope, you put it in a version condition. The parser just collects all the tokens within braces, and then the semantic parses the contents.

ibuclaw commented 3 years ago

@ibuclaw https://github.com/libmir/mir-cpuid/blob/4add5b639351f15044a8c991ce3a905d7973c3de/source/cpuid/x86_any.d#L606-L609 still doesn't parse with the grammar you posted, is that expected?

That's deprecated syntax, and will be an error soon™

ibuclaw commented 3 years ago

Can't both compilers at least support parsing both, and have GccAsmStatement in their grammar? Or is there ambiguity?

Looks like LDC does a lazy check on the first token for the easy cases where it cannot be a dmd-style iasm statement. This would mean that there are some styles of asm that LDC does not support.

asm { ctfeFunctionReturningInstructionsAsString(); }

There are a number of ambiguities between an dmd-style AsmExp and the AssignExpression instruction string of the gdc-style, i.e:

enum cpuid = "cpuid";
asm { cpuid; } // accepted dmd-style asm

These could perhaps be overcome by tweaking the grammar to force instruction expressions (not string literals) to be enclosed in parentheses in the gdc style.

CyberShadow commented 3 years ago

A small change of plans: https://github.com/CyberShadow/tree-sitter-d/commit/ccfcf13144a4a6dfe83d19b17cb1100e24d8ad73

gdamore commented 2 years ago

I would encourage looking at https://github.com/gdamore/tree-sitter-d

I believe it parses pretty much everything correctly.

I've tested it with the DMD test suite, with DMD itself, with DUB, and a large body of non-public work.

CyberShadow commented 2 years ago

Hi @gdamore,

First, thanks for taking the initiative in putting together a complete tree-sitter package for D!

That said, please forgive me if I'm a little skeptical about your claims. Looking at the list in this issue's description, many instances of errors produced by the implementation in this repository is not due to defects in the implementation, but genuinely because the source files contain invalid grammar. If you're claiming that they parse successfully, that would indicate that your parser is overly relaxed, and will consume invalid grammatical constructs and present them as valid D programs.

I'm also puzzled by the route you've taken for your project. As I understand, your tree-sitter grammar is hand-written. Was there any reason why you didn't just take the output from this project and added the missing pieces, such as highlighting queries, to produce a complete package usable in editors?

The main distinction between this project and your endeavor is the scope. This project aims to:

The last point is an important one. Tools will go out of date as soon as someone stops actively looking at them. For evidence of how much this affects our community, here is an example: https://forum.dlang.org/post/vfkwhtigkvkmhtzalwaf@forum.dlang.org

As such, I really think your efforts would have much more impact by contributing to the effort started here. Unfortunately due to recent circumstances I've been unable to dedicate as much time to my open source projects as I would have liked, but I remain available for questions and support.