CommunityToolkit / ColorCode-Universal

This is a port of ColorCode to .NET Standard. The original Html only formatter has been separated from the Logic, so now it can produce Syntax Highlighted code for any output. This Project can currently produce HTML, and Render to UWP RichTextBlocks.
Other
222 stars 42 forks source link

JSON Parsing - deadlock/stuck on parsing Json to HTML #36

Open JochnGst opened 1 year ago

JochnGst commented 1 year ago

Im trying to parse this JSON sipped to my Blazor page. But because of some weird RegEx parsing issue the process get stuck without any Exception. Can somebody tell me where there could be a Problem? here you can find my test project: ColorCodeTest

This is my Test Code

string _jsonString3 = "{\r\n \"raw_causes\": [\r\n      \"Winterglatter Fahrbahn\",\r\n      \"Nicht angepasste Geschwindigkeit\",\r\n      \"test3\"\r\n    ]\r\n      }";
try
{
    var _formatter = new HtmlFormatter();
    var language = ColorCode.Languages.FindById("json");
    _jsonHtml = _formatter.GetHtmlString(_jsonString3, language);

}
catch (Exception)
{

    throw;
}

It stuck after the array Element \"Winterglatter Fahrbahn\", when it call regexMatch = regexMatch.NextMatch(); and I have no idear why this happends

        private void Parse(string sourceCode,
                           CompiledLanguage compiledLanguage,
                           Action<string, IList<Scope>> parseHandler)
        {
            Match regexMatch = compiledLanguage.Regex.Match(sourceCode);

            if (!regexMatch.Success)
                parseHandler(sourceCode, new List<Scope>());
            else
            {
                int currentIndex = 0;

                try
                {
                    while (regexMatch.Success)
                    {
                        string sourceCodeBeforeMatch = sourceCode.Substring(currentIndex, regexMatch.Index - currentIndex);
                        if (!string.IsNullOrEmpty(sourceCodeBeforeMatch))
                            parseHandler(sourceCodeBeforeMatch, new List<Scope>());

                        string matchedSourceCode = sourceCode.Substring(regexMatch.Index, regexMatch.Length);
                        if (!string.IsNullOrEmpty(matchedSourceCode))
                        {
                            List<Scope> capturedStylesForMatchedFragment = GetCapturedStyles(regexMatch, regexMatch.Index, compiledLanguage);
                            List<Scope> capturedStyleTree = CreateCapturedStyleTree(capturedStylesForMatchedFragment);
                            parseHandler(matchedSourceCode, capturedStyleTree);
                        }

                        currentIndex = regexMatch.Index + regexMatch.Length;
                        regexMatch = regexMatch.NextMatch();
                    }
                }
                catch (Exception ex)
                {

                    throw;
                }

                string sourceCodeAfterAllMatches = sourceCode.Substring(currentIndex);
                if (!string.IsNullOrEmpty(sourceCodeAfterAllMatches))
                    parseHandler(sourceCodeAfterAllMatches, new List<Scope>());
            }
        }
JochnGst commented 1 year ago

I found out that there is a conflict with the Key-LanguageRule

new LanguageRule(
     $@"[,\{{]\s*({Regex_String})\s*:",
     new Dictionary<int, string>
         {
             {1, ScopeName.JsonKey}
         }),

for my case it works when I use this RegEx $@"[,\{{]\s*(""\w*"")\s*:" But I know that this will not catch all edge cases for a JSON-Key

GuildOfCalamity commented 1 year ago

I found out that there is a conflict with the Key-LanguageRule

new LanguageRule(
     $@"[,\{{]\s*({Regex_String})\s*:",
     new Dictionary<int, string>
         {
             {1, ScopeName.JsonKey}
         }),

for my case it works when I use this RegEx $@"[,\{{]\s*(""\w*"")\s*:" But I know that this will not catch all edge cases for a JSON-Key

I made this to clean up the RegEx pattern:

       public static List<string> ExtractKeys(string jsonString)
       {
           var keys = new List<string>();
           var matches = Regex.Matches(jsonString, "[,\\{]\"(.*?)\"\\s*:");
           foreach (Match match in matches) { keys.Add(match.Groups[1].Value); }
           return keys;
       }
niltor commented 8 months ago

@GuildOfCalamity @JochnGst Encountering the same problem, what is the reasonable solution?

Yomodo commented 4 months ago

This has become a serious issue for us; showing the colored JSON of certain Intune policies locks up our whole Blazor app making it unusable and taking down the environment with it.

image

Kompiler commented 2 months ago

I believe the issue relates to excessive regex backtracking when parsing json keys.

Atomic groups can be used by tweaking the original LanguageRule from

new LanguageRule(
     $@"[,\{{]\s*({Regex_String})\s*:",
     new Dictionary<int, string>
         {
             {1, ScopeName.JsonKey}
         })

to

new LanguageRule(
     $@"[,\{{]\s*(?>{Regex_String})\s*:",
     new Dictionary<int, string>
         {
             {1, ScopeName.JsonKey}
         })