Drizin / CodegenCS

C# Toolkit for Code Generation (T4 alternative!)
MIT License
223 stars 30 forks source link

Multiline string indent control does not work #16

Closed loop8ack closed 1 year ago

loop8ack commented 1 year ago

Hi there,

First: Nice project, it reduced the code for my CodeGenerator from about 1200 very confusing lines to about 700 clearly readable lines. :)

But I need to write some multiline strings, but they cause problems. A few examples:

using var stringWriter = new StringWriter();
var codeWriter = new CodegenTextWriter(stringWriter);

var s = @"a
empty:

c
empty:

e";

codeWriter.WriteLine($$"""
    namespace N
    {
        public class A
        {
            {{TestMethod(s, "Test1", StringToCode1)}}
            {{TestMethod(s, "Test2", StringToCode2)}}
            {{TestMethod(s, "Test3", StringToCode3)}}
            {{TestMethod(s, "Test4", StringToCode4)}}
            {{TestMethod(s, "Test5", StringToCode5)}}
            {{TestMethod(s, "Test_Workaround", StringToCode_Workaround)}}
            // All tests end
        }
    }
    """);

codeWriter.Flush();

Console.WriteLine(stringWriter.ToString());

static FormattableString TestMethod(string s, string name, Func<string, object> getStringCode)
{
    return $$"""
        public static string {{name}}()
        {
            return {{getStringCode(s)}};
        }
        // {{name}} end
        """;
}
static object StringToCode1(string s)
    => $"\"{s}\"";
static FormattableString StringToCode2(string s)
    => $$"""
        @"{{s}}"
        """;
static FormattableString StringToCode3(string s)
    => $$"""
        @"{{s.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None)}}"
        """;
static FormattableString StringToCode4(string s)
{
    var lines = s
        .Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None)
        .Select<string, FormattableString>(l => $$"""{{l}}""");

    return $$"""
        @"{{lines}}"
        """;
}
static FormattableString StringToCode5(string s)
{
    var lines = s
        .Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None)
        .Select<string, FormattableString>(l => $$"""{{TTW}}{{l}}""");

    return $$"""
        @"{{lines}}"
        """;
}
static Action<ICodegenTextWriter> StringToCode_Workaround(string s)
    => (ICodegenTextWriter writer) =>
    {
        var lines = s
            .Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None);

        var indentLevel = writer.IndentLevel;

        for (int i = 0; i < indentLevel; i++)
            writer.DecreaseIndent();

        writer.Write($"@\"{lines[0]}");

        foreach (var line in lines.Skip(1))
        {
            writer.WriteLine();
            writer.Write(line);
        }

        writer.Write("\"");

        for (int i = 0; i < indentLevel; i++)
            writer.IncreaseIndent();
    };

Das Ergebnis:

namespace N
{
    public class A
    {
        public static string Test1()
        {
            return "a
            return empty:

            return c
            return empty:

            return e";
        }
        // Test1 end
        public static string Test2()
        {
            return @"a
            return @"empty:

            return @"c
            return @"empty:

            return @"e";
        }
        // Test2 end
        public static string Test3()
        {
            return @"a
            return @"empty:
            return @"c
            return @"empty:
            return @"e
            return ";
        }
        // Test3 end
        public static string Test4()
        {
            return @"a
            return @"empty:
            return @"c
            return @"empty:
            return @"e
            return ";
        }
        // Test4 end
        public static string Test5()
        {
            return @"a
return @"empty:
return @"c
return @"empty:
return @"e
            return ";
        }
        // Test5 end
        public static string Test_Workaround()
        {
            return @"a
empty:

c
empty:

e";
    }
    // Test_Workaround end
        // All tests end
    }
}

The last method is my workaround, for my purposes it works, but it messes up the indentation of the following lines, as you can see from the comments. I also had another problem where the string was written without a duplicated return, but the indentation of the new lines was messed up. However, I could not reproduce the problem in the small sample code.

For me this TODO would solve the problem, then I can split the string and put all following lines at the beginning of the line, but this would then no longer be automatic indent control?

Best greetings

Drizin commented 1 year ago

Can you share the body contents that you want to generate in Test5() ?

loop8ack commented 1 year ago

All tests should return the same string:

public static string Test()
{
        return @"a
empty:

c
empty:

e";
}

So the returned string is exactly the same as the input string - including line breaks and spaces before the contents. And it should not reduce the indentation of the following code, as it happens with my workaround.

Drizin commented 1 year ago

I see. The mixed-indenting is a little ugly so I thought that was not your intention. So I assume that you can't render raw string literals (the generated code won't run C#11), correct?

So... basically in CodegenCS when you write any interpolated object, it will "save cursor position" - I guess that's the major purpose of the library, and it's exactly to help reusing code blocks without forcing the "inner block" to be aware of the "outer block" indentation. When you use the public Write methods the library will automatically honor this "current indentation" (current cursor position) by automatically indenting whenever there's a new line.

If you want to write "raw" (without respecting this automatic-indent feature), I think you could use reflection to invoke CodegenTextWriter.InnerWriteRaw() or even get the StringWriter _innerWriter and write directly to it.

Another possibility would be creating an ICodegenTextWriter.AutoIndent=false mode, where InnerIndentCurrentLine() shouldn't do anything, and then I guess no further changes would be required in InnerWrite() - it would still call InnerWriteRaw() (line by line) but it wouldn't add any auto-indent.

Please give it a try on whatever method you prefer. Feel free to submit a PR if you go with the third option.

loop8ack commented 1 year ago

So I assume that you can't render raw string literals (the generated code won't run C#11), correct?

Yes, I would like to be able to support older versions as well.

If you want to write "raw" (without respecting this automatic-indent feature), I think you could use reflection to invoke CodegenTextWriter.InnerWriteRaw() or even get the StringWriter _innerWriter and write directly to it.

This works, but I solved it with a derivative:

class MyWriter : CodegenTextWriter
{
    public MyWriter(StringWriter textWriter)
        : base(textWriter)
    {
        DependencyContainer.RegisterSingleton(this);
    }

    public void WriteRaw(string value)
        => InnerWriteRaw(value);
}

And the StringToCode_Workaround-Methode from my sample:

static Action<MyWriter> StringToCode_Workaround(string s)
    => (MyWriter writer) =>
    {
        writer.WriteRaw($"@\"{s}\"");
    };

This is not optimal either, but it is better than my other workaround and does not mess up the indentations that follow.

Another possibility would be creating an ICodegenTextWriter.AutoIndent=false mode

But the AutoIndent is a big advantage? Only with multiline strings it becomes a problem.

I have two other suggestions:

Drizin commented 1 year ago

Oh, subclassing. I like it. Nice solution!

And nice suggestions as well. Feel free to PR (This project needs more collaborators and you're clearly a good candidate - thanks in advance!)

loop8ack commented 1 year ago

There it is :)

However, I think it's still a bug that it duplicates the text of the same line for multi-line strings.

Drizin commented 1 year ago

THAT was fast. Thanks!! I'll take a look at the weekend.

Drizin commented 1 year ago

Merged (thanks again!). Can you elaborate on the bug you've mentioned?

loop8ack commented 1 year ago

Merged (thanks again!).

Awesome :) Can you create a new nuget version? Then I can remove my workaround :D

Can you elaborate on the bug you've mentioned?

You can see the bug in my sample code. The code:

var s = @"a
empty:

c
empty:

e";

codeWriter.WriteLine($$"""
    public static string Test()
    {
        return "{{s}}";
    }
    """);

The result is:

public static string Test()
{
    return "a
    return "empty:

    return "c
    return "empty:

    return "e";
}

I would expect something like ...

public static string Test()
{
    return "a
    empty:

    c
    empty:

    e";
}

... or ...

public static string Test()
{
    return "a
            empty:

            c
            empty:

            e";
}

.... or something else, but not duplicating the line.

If I need to duplicate a line, I would use a list and in Select() write the line. Or is the behavior intentional?

And I found another similar bug, maybe this is also the same, just reproduced differently?

The code:

FormattableString code = $"""
    var a = 1;
    var b = 2;
    """;

codeWriter.WriteLine($$"""
    public static string Test()
    {

        {{TLW}}{{code}}
    }
    """);

The result:

public static string Test()
{var a = 1;
ublic static string Test()
{var b = 2;
}

I would expect:

public static string Test()
{var a = 1;
var b = 2;
}

The indentation isn't important there to begin with, but it duplicates code it shouldn't. This example is very minimalistic (just to reproduce), I had another more useful example, but that would be too much for such an issue. It doesn't stop me, I just noticed it while trying to remove blank lines, so for me it is not a critical bug - but it's a bug :)

Drizin commented 1 year ago

Oh, got it. I thought you were saying there was a bug still in your PR.

Let me explain: The duplicated line is because whenever there's an interpolated-object (placeholder) the writer will "capture" the current indent, and in case it can capture even non-whitespace indent. This allows templates to have any kind of indentation (like // or #) and yet the multiline strings will honor (preserve) that indent. Like this:

// {{MultilineComment}}
INSERT INTO [Table] etc..

Or this:

# {{ multilineComment }}
curl https://www.google.com
ping 8.8.8.8

So basically when {{ something }} spawns into multiple lines, they are all indented with the exact same indent that the first line (where the placeholder was added) had.

You should be able to disable that behavior by setting PreserveNonWhitespaceIndent to false. You will probably get what you expect (your first expectation). For the second expectation, we would probably have to manipulate this implicit indent by converting "return " to " ". Probably we could add something like ConvertNonWhitespaceIndentToWhitespace, or maybe even allow the use of lambda functions to transform the way this implicit indent is captured.

For the second case (where you use TLW) it's really a bug. According to my previous comment (about implicit indentation allowing non whitespace characters) the first curly-brace should be repeated in every line, but looks like it's including the previous line (method signature) when it shouldn't.

loop8ack commented 1 year ago

I see, so it is an intentional feature after all. Then of course it's not a bug - at least the first case :)

Probably we could add something like ConvertNonWhitespaceIndentToWhitespace, or maybe even allow the use of lambda functions to transform the way this implicit indent is captured.

I wouldn't do that, it just makes it unnecessarily complex, whereas I can easily work around it by writing TLW and a new line or set PreserveNonWhitespaceIndent to false.

According to my previous comment (about implicit indentation allowing non whitespace characters) the first curly-brace should be repeated in every line,

I would have expected it differently. But I understand the goal behind it and it doesn't bother me, my project works and also the error with the duplicated method definition is no problem for me.

So I just need a new NuGet version so I can remove the workaround, then I would close this issue too :)

Drizin commented 1 year ago

I found and fixed the bug with TLW (your second example where the wrong line gets repeated) - please check my last commit. Package CodegenCS.Core 3.3.3 was published - please give it a try and let me know if it works. There are now new behaviors for controlling the way that implicit-indent works - to work exactly like you expected.

If it works fine, please consider buying me a coffee and giving the project a star.

Thanks again for your bug reports and for your contribution!

loop8ack commented 1 year ago

Everything works for me - thank you :)