firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.28k stars 199 forks source link

C# flavor #156

Closed shiralizadeh closed 2 years ago

shiralizadeh commented 10 years ago

Hi regex101, Can you add C# to your Flavor section? If you want I can help you for this.

Thanks, Shiralizadeh

roryap commented 5 years ago

@firasdib Have you considered using .NET Core's AOT native image generator "CrossGen"? https://github.com/dotnet/coreclr/blob/master/Documentation/building/crossgen.md

https://github.com/dotnet/coreclr/blob/master/Documentation/botr/readytorun-overview.md

firasdib commented 4 years ago

@roryap I haven't looked into it.

This project kind of dragged out and never finished, even though it was 99% complete.

@chucker Are you still too busy to help?

roryap commented 4 years ago

Hi folks, I know everyone is very busy and COVID19-wary, but is it possible to work on this in small increments? There was a lot of activity about a year ago but it's gone pretty quiet. I would love to help and am slowly catching up on how this all works.

Doqnach commented 4 years ago

Last I'm aware of is that it got stuck at compiling the dll's to something decently sized.

I'm sure any help would be appreciated here.

Benjathing commented 4 years ago

Big +1 to this feature; I use regex101's saved regex feature as a way of commenting my regexes in my code, along with the test cases. Usually, PCRE is sufficient; however, my most recent project had balancing groups in the regexes.

Interestingly, this project is a blazor client-side web-assembly project. All the regex work is being done client side in a dotnet web assembly. These can talk to and from JavaScript via JSInterop. Is this an approach that might be viable for this feature?

firasdib commented 4 years ago

Whats holding this back is the bundle size. Feature wise it works. Still havent had more time to try to figure out how to shrink the lib.

Vänliga hälsningar / Best regards,

Firas Dib

On 14 Sep 2020, at 01:16, Benjathing notifications@github.com wrote:

 Big +1 to this feature; I use regex101's saved regex feature as a way of commenting my regexes in my code, along with the test cases. Usually, PCRE is sufficient; however, my most recent project had balancing groups in the regexes.

Interestingly, this project is a blazor client-side web-assembly project. All the regex work is being done client side in a dotnet web assembly. These can talk to and from JavaScript via JSInterop. Is this an approach that might be viable for this feature?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Shane32 commented 4 years ago

I think the answer lies in CoreRT: ahead of time compilation (not JIT or interpretation of IL) of the exact code necessary to run .Net’s Regex code.

See https://mattwarren.org/2018/06/07/CoreRT-.NET-Runtime-for-AOT/

chucker commented 4 years ago

Last I'm aware of is that it got stuck at compiling the dll's to something decently sized.

Correct. I used Mono-WASM directly in order to save on complexity compared to Blazor (which builds on top of that, and which mostly offers SPA things we don’t need here), but it’s a more manual process. Blazor integrated the Mono linker, which is a dead code elimination tool.

So one approach would be to use Blazor instead. I don’t think the end result will be smaller, but it might be easier to maintain.

Interestingly, this project is a blazor client-side web-assembly project.

This is not Blazor, technically.

I think the answer lies in CoreRT: ahead of time compilation (not JIT or interpretation of IL) of the exact code necessary to run .Net’s Regex code.

CoreRT is basically dead (.NET 5 will ship Mono’s AOT instead).

It’s possible, although still a bit experimental, to use AOT here. However, the main goal to that is performance, not code size.

The next step in this project is to talk to the linker correctly, or to use Blazor (and automatically get the linker).

firasdib commented 3 years ago

Bumping this, anyone who want to help me get this done 😄 ?

roryap commented 3 years ago

Sure, I'll help, but still not clear on what approach has been decided on.

firasdib commented 3 years ago

@roryap Nothing has been set in stone; whatever allows us to generate the smallest and most performant bundle.

Code-DJ commented 3 years ago

@firasdib how are the php, python and golang regexes done (are they purely javascript driven)? would it be too complicated to do it via web services that run the regexes on the server in .net and return the results?

Is server-side blazor out of question due to the hosting model or the potential server load?

firasdib commented 3 years ago

@Code-DJ Sorry, that part is fixed. It has to run on the client, no server side work.

Code-DJ commented 3 years ago

Hi @firasdib have you tried bridge.net (https://github.com/bridgedotnet/Bridge) as @AnderssonPeter and @TWiStErRob have suggested. If yes, then what problems did you encounter?

You can paste your performance code on deck.net (https://deck.net) and evaluate it compared to mono/blazor.

Doqnach commented 3 years ago

I looked at Bridge .NET but it is not entirely clear to me if it would transpile the .NET regex engine to JS, or if it just translates C# instructions to JS and use the JS regex engine?

This issue is about that first case (running the actual .NET regex engine), not the second (using JS engine).

@Code-DJ any insight on that?

Code-DJ commented 3 years ago

@Doqnach I am not sure, but I tried various namespaces in .NET e.g. System.Diagnostics - Stopwatch and was pleasantly surprised that it worked. It may simply be converting C# to JS equivalent as you suggested but look at the following:

https://github.com/bridgedotnet/Bridge/blob/master/Bridge/Resources/Text/RegularExpressions/RegexParser.js https://github.com/dotnet/runtime/blob/master/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs

Search for scandollar or scanoctal, looks like they have written the javascript equivalent of those methods. Doesn't mean anything. We need examples that return different results on vanilla JS and .NET to see if bridge.net returns results like .NET or like JS to see if it is a viable solution.

Elsensee commented 3 years ago

It's looks to me like it uses the browser's regex engine for regex's that don't use the features that are special to .NET. So a simple regex will use the JavaScript implementation as a shortcut, everything else seems to be handled by BridgeNET.

But that's just from a quick look into it, I might need to look into it further.

Doqnach notifications@github.com schrieb am Do., 7. Jan. 2021, 11:02:

I looked at Bridge .NET but it is not entirely clear to me if it would transpile the .NET regex engine to JS, or if it just translates C# instructions to JS and use the JS regex engine?

This issue about that first case, not the second.

@Code-DJ https://github.com/Code-DJ any insight on that?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/firasdib/Regex101/issues/156#issuecomment-756015367, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPZ7UIAQGK2JBCVLUUQP3SYWBDHANCNFSM4AWGEDYQ .

Shane32 commented 3 years ago

I agree. All I want is to be able to copy and paste from C# - with C# escape sequences and stuff - but the regex engine can be javascript’s engine.

Doqnach commented 3 years ago

@Shane32 the code generator should cover your need in that regard.

lfr commented 3 years ago

@Shane32 Not sure if I misunderstood your comment, but this issue is entitled "C# flavor", it's not just about being able to copy/paste C# escaped sequences, it's about C# flavor expression parsing, which means seeing the results that a .NET CLR Regex engine yields. That said if what @Elsensee says is right, then it should work.

Also C# flavor is a fine title for this issue, but probably the end feature should be called .NET flavor as it'll also be applicable to F#/VB.

firasdib commented 3 years ago

I've been poking this intermittently but alas to no avail, I just don't know what I am doing. I stumbled upon this link: https://www.mono-project.com/news/2018/01/16/mono-static-webassembly-compilation/ -- does this help anyone?

@chucker your example repo is no longer working for me as I can't download the files necessary as outlined by your readme. Has there been any advancements on the mono-wasm front that would allow for smaller binaries?

bebo-dot-dev commented 3 years ago

@chucker your example repo is no longer working for me as I can't download the files necessary as outlined by your readme.

I imagine that the dead links look like this: https://jenkins.mono-project.com/job/test-mono-mainline-wasm/label=ubuntu-1804-amd64/

If so the cause of this is that the mono project has been migrating away from jenkins to azure CI for some time and they have switched their old jenkins setup to private as part of this move. https://github.com/mono/mono/issues/20841

Unfortunately it seems to be taking a long time for their CI migration project to complete and meanwhile all of the dead jenkins links remain everywhere which is causing much confusion with many people.

The new azure CI setup is here but it seems that they're not yet publishing any built release artefacts: https://dev.azure.com/dnceng/public/_build/results?buildId=1240228&view=results

I imagine that the only way to get built mono wasm binaries at the moment would be to build it yourself.

Has there been any advancements on the mono-wasm front that would allow for smaller binaries?

https://github.com/mono/mono/issues/9857 remains open. I believe that it's currently ~1.8MB for mono wasm vs ~2MB for blazor

This is very decent: https://krausest.github.io/js-framework-benchmark/current.html

chucker commented 3 years ago

@firasdib @bebo-dot-dev the reason is likely not the CI switch but the migration from mono/mono to dotnet/runtime. https://github.com/dotnet/runtime/tree/main/src/mono/wasm seems be the more current URL. For mono-wasm.

I haven't tried to build my PoC with a newer build, though.

firasdib commented 3 years ago

I could try building from source and see if I can recompile your PoC that way.

firasdib commented 3 years ago

I managed to build it, but I need some hand holding to get any further. Sorry.

TimberStalker commented 2 years ago

Has there been any progress on this?

firasdib commented 2 years ago

None, unfortunately. I ran into a road block, so I would need someone to help me setup a new PoC.

Code-DJ commented 2 years ago

@firasdib is there a way for you to share the API? That way we have an idea on what input/output you are expecting?

Just bouncing ideas. To keep the download size small, we can look into https://github.com/SteveSandersonMS/Blazor which is the beginning of Blazor. See the Questions section, Steve mentions a 300KB download vs. 3mb download for Blazor in .NET6.

That repo is missing System.Text.RegularExpressions but has things like System.IO, System.Net.Http etc. that are not needed.

TimberStalker commented 2 years ago

What does PoC stand for?

AlbertoMonteiro commented 2 years ago

@TimberStalker

What does PoC stand for?

Proof of concept

firasdib commented 2 years ago

@Code-DJ It has to allow me to run matches and substitutions, global and non-global. I.e., match(regex, flags, string) and substitute(regex, flags, string, replacement). It can of course be different depending on the language, I'm flexible.

w4po commented 2 years ago

+1 The balancing group is a great feature in.NET Regex engine, I use it to match HTML elements, it would be very nice to add C# flavor.

firasdib commented 2 years ago

I stumbled upon https://github.com/dotnet/runtime/tree/main/src/mono/wasm -- is this something that can be used? Anyone who has time to experiment with it?

AlbertoMonteiro commented 2 years ago

@firasdib I've made a simple blazor app that from javascript call a C# function

This is the app repo: https://github.com/AlbertoMonteiro/BlazorAppRegex I am using dotnet 6.0.101 I've hosted the static site using github pages, you can check it here: https://albertomonteiro.github.io/BlazorAppRegex/

As you can see it just simple evaluate a regex for a given text, return true if match false if not.

Calling the c# function from js is really simple https://github.com/AlbertoMonteiro/BlazorAppRegex/blob/454de898ab31533734ceb04a37e5caedc68d852d/index.html#L30-L34

The appName is the assembly name, in that case BlazorApp1.

In C# to be called from js, this is what I had to do: https://github.com/AlbertoMonteiro/BlazorAppRegex/blob/3f1f31d5a83264f01bd5fdb8af83b89fb6a33522/BlazorApp1/Program.cs#L11-L16

I hope this can help

AlbertoMonteiro commented 2 years ago

I just improved the repo that I mentioned in the previous comment, you can check the last master version

Thats the same regex and value being evaluated with C# (left) and javascript on regex101(right) image

firasdib commented 2 years ago

@AlbertoMonteiro Thank you! Can you add a readme for how I can compile it myself?

SunSerega commented 2 years ago

@AlbertoMonteiro I was able to receive this error (in the alert window, after clicking Regex match? button) twice:

Error: No .NET call dispatcher has been set.

1 When first trying it and 2 just now, after reconfiguring and restarting windows a bunch of times. In both cases, the error disappeared after reloading the page.

AlbertoMonteiro commented 2 years ago

Error: No .NET call dispatcher has been set.

@firasdib I've added the README, let me know if this is enough, I can provide more details if you need more help!!

@AlbertoMonteiro I was able to receive this error (in the alert window, after clicking Regex match? button) twice:

Error: No .NET call dispatcher has been set.

1 When first trying it and 2 just now, after reconfiguring and restarting windows a bunch of times. In both cases, the error disappeared after reloading the page.

@SunSerega I covered that issue that you faced in the readme of the repo check gh-pages section.

firasdib commented 2 years ago

@AlbertoMonteiro Thank you. I tried this on my Linux machine, which has:

dotnet --version
6.0.100

Running dotnet run I get

Building...
It was not possible to find any compatible framework version
The framework 'Microsoft.AspNetCore.App', version '6.0.1' (x64) was not found.
  - No frameworks were found.

You can resolve the problem by installing the specified framework and/or SDK.

The specified framework can be found at:
  - https://aka.ms/dotnet-core-applaunch?framework=Microsoft.AspNetCore.App&framework_version=6.0.1&arch=x64&rid=manjaro-x64

Adjusting the csproj-file to reflect the version I have results in the following error:

Building...
/home/firas/projects/BlazorAppRegex/BlazorApp1/BlazorApp1.csproj : error NU1102: Unable to find package Microsoft.AspNetCore.Components.WebAssembly with version (>= 6.0.100)
/home/firas/projects/BlazorAppRegex/BlazorApp1/BlazorApp1.csproj : error NU1102:   - Found 40 version(s) in nuget.org [ Nearest version: 6.0.2 ]
/home/firas/projects/BlazorAppRegex/BlazorApp1/BlazorApp1.csproj : error NU1102: Unable to find package Microsoft.AspNetCore.Components.WebAssembly.DevServer with version (>= 6.0.100)
/home/firas/projects/BlazorAppRegex/BlazorApp1/BlazorApp1.csproj : error NU1102:   - Found 40 version(s) in nuget.org [ Nearest version: 6.0.2 ]

The build failed. Fix the build errors and run again.

What am I doing wrong :-)?

AlbertoMonteiro commented 2 years ago

@firasdib I am going to setup an Ubuntu and try this out there

AlbertoMonteiro commented 2 years ago

@firasdib I've just tried it now and worked fine

image

Since I am using ubuntu, I used those instructions: https://docs.microsoft.com/pt-br/dotnet/core/install/linux-ubuntu#2104-

Installing with APT can be done with a few commands. Before you install .NET, run the following commands to add the Microsoft package signing key to your list of trusted keys and add the package repository.

Open a terminal and run the following commands:

wget https://packages.microsoft.com/config/ubuntu/21.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
rm packages-microsoft-prod.deb

Install the SDK

The .NET SDK allows you to develop apps with .NET. If you install the .NET SDK, you don't need to install the corresponding runtime. To install the .NET SDK, run the following commands:

sudo apt-get update; \
  sudo apt-get install -y apt-transport-https && \
  sudo apt-get update && \
  sudo apt-get install -y dotnet-sdk-6.0

Which distro are you using?

firasdib commented 2 years ago

I use Arch.

Installing via Snap worked, thank you! My friday is gonna be fun!

firasdib commented 2 years ago

After building, I can see it outputs ~40 files (3.3mb gzip), which includes a bunch of dll-files. Are all of these necessary, or is there a way to reduce the size/amount of files necessary? When I load your example website, I don't see what many files being fetched over the network, for example.

Sorry for all the questions!

Edit: Looks like these files are hard cached, checking it out in incognito will show the files being downloaded. The question then is, can some of these be omitted?

AlbertoMonteiro commented 2 years ago

@firasdib Yeah, I am going to look into that, I know that there is possible to work with some trimming strategies, but I have to research a lite bit because I am no specialist in Blazor. But someone said that it may be possible to reduce the complete size to 1mb

firasdib commented 2 years ago

@AlbertoMonteiro I'll also investigate a bit on my end. 40 files, even if they are small, can cause some serious latencies. Luckily they are only fetched once, but still.

Keep me posted on what you find :-)

AlbertoMonteiro commented 2 years ago

If you want to research about that too, this would be a starting point: https://docs.microsoft.com/en-us/aspnet/core/blazor/host-and-deploy/configure-trimmer?view=aspnetcore-6.0

firasdib commented 2 years ago

@AlbertoMonteiro look like you can enable AOT-compilation, which should reduce the amount of files, and probably improve performance

AlbertoMonteiro commented 2 years ago

@firasdib Yeah, Steve Sanderson talks about it in this video: New Blazor WebAssembly capabilities in .NET 6

This link goes directly to the time when he starts to talk about the AOT: https://youtu.be/kesUNeBZ1Os?t=1357

AlbertoMonteiro commented 2 years ago

@firasdib I've changed some stuff and without AOT I was able to reduce the total download size to 1.2mb gziped

image

I've changed the csproj file, with that new content:

<Project Sdk="Microsoft.NET.Sdk.BlazorWebAssembly">

    <PropertyGroup>
        <TargetFramework>net6.0</TargetFramework>
        <Nullable>enable</Nullable>
        <ImplicitUsings>enable</ImplicitUsings>

        <!-- Remove some unused features. Shrinks the published app by ~700KB. -->
        <InvariantGlobalization>true</InvariantGlobalization>
        <BlazorEnableTimeZoneSupport>false</BlazorEnableTimeZoneSupport>
    </PropertyGroup>

    <ItemGroup>
        <PackageReference Include="Microsoft.AspNetCore.Components.WebAssembly" Version="6.0.1" />
    </ItemGroup>

</Project>

Program.cs

using Microsoft.AspNetCore.Components.WebAssembly.Hosting;
using Microsoft.JSInterop;
using System.Text.RegularExpressions;

_ = WebAssemblyHostBuilder.CreateDefault(args);

public static class Sample
{
    [JSInvokable]
    public static object SayHelloCS(string regex, string value)
    {
        var result = Regex.Match(value, regex);
        return new
        {
            result.Success,
            Captures = result.Captures.Cast<Capture>().Select(x => new { x.Index, x.Length, x.Value }),
            Groups = result.Groups.Cast<Group>().Select(x => new { x.Index, x.Length, x.Success, x.Name, x.Value })
        };
    }
}
AlbertoMonteiro commented 2 years ago

@firasdib another improvement, reduced total size to 896kb(gziped), again, without AOT image

Disabled implicit usings in csproj

<ImplicitUsings>disable</ImplicitUsings>

I had to add new using in Program.cs

using Microsoft.AspNetCore.Components.WebAssembly.Hosting;
using Microsoft.JSInterop;
using System.Linq; //I HAD TO ADD THIS LINE
using System.Text.RegularExpressions;

_ = WebAssemblyHostBuilder.CreateDefault(args);

public static class Sample
{
    [JSInvokable]
    public static object SayHelloCS(string regex, string value)
    {
        var result = Regex.Match(value, regex);
        return new
        {
            result.Success,
            Captures = result.Captures.Cast<Capture>().Select(x => new { x.Index, x.Length, x.Value }),
            Groups = result.Groups.Cast<Group>().Select(x => new { x.Index, x.Length, x.Success, x.Name, x.Value })
        };
    }
}