dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.71k stars 1.06k forks source link

JsonConvert.DeserializeObject for non-English languages #32548

Open grzeskoo opened 1 year ago

grzeskoo commented 1 year ago

Problem downloading/parsing data. The problem does not occur when downloading data using a browser, powershell or external software like Postman. -->

Description

When downloading data from endpoint https://learn.microsoft.com/api/catalog using locale and type filters it looks like HttpClient can't cope with downloading all the content and often cuts off the response. Following this lead that it is not the client's fault the problem could be with data parsing i.e. related to JsonConvert.DeserializeObject

image image

Based on documentation https://learn.microsoft.com/en-us/training/support/catalog-api-developer-reference It is possible to use WebClient (currently it is outdated) - but for testing I use this service as well as HttpClient.

Tested with a library using HTTPClient - also with some languages an error is thrown (very random behavior - sometimes with one sometimes with another) https://github.com/markjulmar/MSLearnCatalogAPI

To Reproduce

CODE VERSION with WEBClient 
--------
var locales = new List<string>
        {
            "ar-SA", "bg-BG", "bs-cyrl-BA", "bs-latn-BA", "ca-ES", "cs-CZ", "da-DK",
            "de-AT", "de-CH", "de-DE", "el-GR", "en-AU", "en-CA", "en-GB", "en-IE", "en-IN", "en-MY",
            "en-NZ", "en-SG", "en-US", "en-ZA", "es-ES", "es-MX", "et-EE", "eu-ES", "fi-FI", "fil-PH",
            "fr-BE", "fr-CA", "fr-CH", "fr-FR", "ga-IE", "gl-ES", "he-IL", "hi-IN", "hr-HR", "hu-HU",
            "id-ID", "is-IS", "it-CH", "it-IT", "ja-JP", "kk-KZ", "ko-KR", "lb-LU", "lt-LT", "lv-LV",
            "ms-MY", "mt-MT", "nb-NO", "nl-BE", "nl-NL", "pl-PL", "pt-BR", "pt-PT", "ro-RO", "ru-RU",
            "sk-SK", "sl-SI", "sr-cyrl-RS", "sr-latn-RS", "th-TH", "tr-TR", "uk-UA", "vi-VN", "zh-CN",
            "zh-HK", "zh-TW"
        };

        foreach (var locale in locales)
        {
            Console.WriteLine(locale);
            var msLearnBaseAddress = "https://learn.microsoft.com/";
            var baseMsLearnApiAddress = "api/catalog?locale=";
            var corruptedApiAddressScope = "&type=learningPaths,modules";

            try
            {
                var addressWithLocale = msLearnBaseAddress + baseMsLearnApiAddress + locale + corruptedApiAddressScope;

                var client = new WebClient();
                client.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko");
                var response = client.DownloadString(addressWithLocale);
                dynamic parsedJson = JsonConvert.DeserializeObject(response);
                JsonConvert.SerializeObject(response, Formatting.Indented);
                var reeee = Convert.ToString(response);

            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }

 CODE VERSION with HTTPClient

public class Program
{
    private const string MsLearnBaseAddress = "https://learn.microsoft.com/";
    private const string BaseMsLearnApiAddress = "api/catalog?locale=";
    private const string CorruptedAPIAddressScope = "&type=learningPaths,modules";

    static async Task Main(string[] args)
    {
        await FetchVendorData();
    }

    static async Task FetchVendorData()
    {
        var locales = new List<MsLearnLocaleDto>
        {
            new() {LocaleCode = "ar-SA", IsActive = true},
            new() {LocaleCode =  "bg-BG", IsActive = true},
            new() {LocaleCode =  "bs-cyrl-BA", IsActive = true},
            new() {LocaleCode =  "bs-latn-BA", IsActive = true},
            new() {LocaleCode =  "ca-ES", IsActive = true},
            new() {LocaleCode =  "cs-CZ", IsActive = true},
            new() {LocaleCode =  "da-DK", IsActive = true},
            new() {LocaleCode = "de-AT", IsActive = true},
            new() {LocaleCode =  "de-CH", IsActive = true},
            new() {LocaleCode =  "de-DE", IsActive = true},
            new() {LocaleCode =  "el-GR", IsActive = true},
            new() {LocaleCode =  "en-AU", IsActive = true},
            new() {LocaleCode =  "en-CA", IsActive = true},
            new() {LocaleCode =  "en-GB", IsActive = true},
            new() {LocaleCode =  "en-IE", IsActive = true},
            new() {LocaleCode =  "en-IN", IsActive = true},
            new() {LocaleCode =  "en-MY", IsActive = true},
            new() {LocaleCode = "en-NZ", IsActive = true},
            new() {LocaleCode =  "en-SG", IsActive = true},
            new() {LocaleCode =  "en-US", IsActive = true},
            new() {LocaleCode =  "en-ZA", IsActive = true},
            new() {LocaleCode =  "es-ES", IsActive = true},
            new() {LocaleCode =  "es-MX", IsActive = true},
            new() {LocaleCode =  "et-EE", IsActive = true},
            new() {LocaleCode =  "eu-ES", IsActive = true},
            new() {LocaleCode =  "fi-FI", IsActive = true},
            new() {LocaleCode =  "fil-PH", IsActive = true},
            new() {LocaleCode = "fr-BE", IsActive = true},
            new() {LocaleCode =  "fr-CA", IsActive = true},
            new() {LocaleCode =  "fr-CH", IsActive = true},
            new() {LocaleCode =  "fr-FR", IsActive = true},
            new() {LocaleCode =  "ga-IE", IsActive = true},
            new() {LocaleCode =  "gl-ES", IsActive = true},
            new() {LocaleCode =  "he-IL", IsActive = true},
            new() {LocaleCode =  "hi-IN", IsActive = true},
            new() {LocaleCode =  "hr-HR", IsActive = true},
            new() {LocaleCode =  "hu-HU", IsActive = true},
            new() {LocaleCode = "id-ID", IsActive = true},
            new() {LocaleCode =  "is-IS", IsActive = true},
            new() {LocaleCode =  "it-CH", IsActive = true},
            new() {LocaleCode =  "it-IT", IsActive = true},
            new() {LocaleCode =  "ja-JP", IsActive = true},
            new() {LocaleCode =  "kk-KZ", IsActive = true},
            new() {LocaleCode =  "ko-KR", IsActive = true},
            new() {LocaleCode =  "lb-LU", IsActive = true},
            new() {LocaleCode =  "lt-LT", IsActive = true},
            new() {LocaleCode =  "lv-LV", IsActive = true},
            new() {LocaleCode = "ms-MY", IsActive = true},
            new() {LocaleCode =  "mt-MT", IsActive = true},
            new() {LocaleCode =  "nb-NO", IsActive = true},
            new() {LocaleCode =  "nl-BE", IsActive = true},
            new() {LocaleCode =  "nl-NL", IsActive = true},
            new() {LocaleCode =  "pl-PL", IsActive = true},
            new() {LocaleCode =  "pt-BR", IsActive = true},
            new() {LocaleCode =  "pt-PT", IsActive = true},
            new() {LocaleCode =  "ro-RO", IsActive = true},
            new() {LocaleCode =  "ru-RU", IsActive = true},
            new() {LocaleCode = "sk-SK", IsActive = true},
            new() {LocaleCode =  "sl-SI", IsActive = true},
            new() {LocaleCode =  "sr-cyrl-RS", IsActive = true},
            new() {LocaleCode =  "sr-latn-RS", IsActive = true},
            new() {LocaleCode =  "sv-SE", IsActive = true},
            new() {LocaleCode =  "th-TH", IsActive = true},
            new() {LocaleCode =  "tr-TR", IsActive = true},
            new() {LocaleCode =  "uk-UA", IsActive = true},
            new() {LocaleCode =  "vi-VN", IsActive = true},
            new() {LocaleCode = "zh-CN", IsActive = true},
            new() {LocaleCode = "zh-HK", IsActive = true},
            new() {LocaleCode =  "zh-TW", IsActive = true}
        };

        var learnDataResult = new List<MsLearnDataDto>();

        await FetchLocaleMsLearnData(locales, learnDataResult);
    }

    private static async Task<List<MsLearnDataDto>> FetchLocaleMsLearnData(IList<MsLearnLocaleDto> msLearnLocaleDtos,
        List<MsLearnDataDto> msLearnDataDtos, int nrOfAttempts = 0)
    {
        var errorList = new List<MsLearnLocaleDto>();

        await Parallel.ForEachAsync(msLearnLocaleDtos.Where(x => x.IsActive), new ParallelOptions { MaxDegreeOfParallelism = 3 }, async (locale, token) =>
            {
                try
                {
                    var addressWithLocale = BaseMsLearnApiAddress + locale.LocaleCode + CorruptedAPIAddressScope;
                    var response = await GetDataAsync(addressWithLocale, token);
                    var result = JsonConvert.DeserializeObject<MsLearnDataDto>(response);

                    msLearnDataDtos.Add(result);
                }
                catch (Exception e)
                {
                    Console.WriteLine(e.Message + locale);
                    errorList.Add(locale);
                }
            });

        if (errorList.Any() && nrOfAttempts <= 3)
        {
            nrOfAttempts++;
            await FetchLocaleMsLearnData(errorList, msLearnDataDtos, nrOfAttempts);
        }

        return msLearnDataDtos;
    }

    private static async Task<string> GetDataAsync(string endpoint, CancellationToken token)
    {
        Console.WriteLine(endpoint);

        var httpClient = new HttpClient
        {
            BaseAddress = new Uri(MsLearnBaseAddress)
        };

        var response = await httpClient.GetAsync(endpoint, token);

        if (!response.IsSuccessStatusCode)
        {
            return null;
        }

        return await response.Content.ReadAsStringAsync(token);
    }
}
--------

Additional

Additional screenshots and error where the problem was located described https://learn.microsoft.com/en-us/answers/questions/1279951/https-learn-microsoft-com-api-catalog-type-filteri -->

Exceptions (if any)

depending on the object, which is impossible to parse:

Unterminated string. Expected delimiter: ". Path 'modules[1668].subjects', line 1, position 2826152. image

Tests

Tested on different types of response - string/stream - same issue. image

Tested downloading and comparing data using powershell - works fine.

Invoke-RestMethod -Uri "https://learn.microsoft.com/api/catalog?locale=en-US&type=roles,modules,products,levels,learningPaths" -OutFile test1.txt Invoke-RestMethod -Uri "https://learn.microsoft.com/api/catalog?locale=en-US&type=modules,roles,products,levels,learningPaths" -OutFile test2.txt compare-object $(Get-Content test1.txt) $(Get-content test2.txt)

Conclusion

My suspicions are directed toward the built-in parser which, after retrieving the data, is unable to display it in JSON form, but also displaying the preview in text form looks like the response has been truncated.

Further technical details

.NET 7.0 and .NET 6.0 -Microsoft Visual Studio Enterprise 2022 (2) (64-bit) - Version 17.5.1

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

MSFT-AN commented 1 year ago

Thanks for posting here. As the GitHub community can have further insights into this related to the SDK coding or related workarounds to get this to work.

As an update, created a PowerShell script to see which if the non-English languages not returning the same sort values. (See below PS if interested) So, it turns out that sometimes the locale languages es-ES as an example (although it can be different in some runs) for the sort order roles,modules,products,levels,learningPaths & modules,roles,products,levels,learningPaths is in fact different.

sort order "roles,modules,products,levels,learningPaths"**
"type":"module","title":"Implementación de un módulo precompilado en el dispositivo perimetral","duration_in_minutes":56,"rating":{"count":890,"average":4.58},"popularity":0.460448932003016,"icon_url":"https://learn.micro
"modules,roles,products,levels,learningPaths"**
"type":"module","title":"Implementación de un módulo precompilado en el dispositivo perimetral","duration_in_minutes":56,"rating":{"count":890,"average":4.58},"popularity":0.460448932003016,"icon_url":https://learn.microsoft.com/es-es/training/achievements/student-evangelism/deploy-pre-built-module-iot-edge.svg...

So, the recommendation (at this time) is to use the sort order "modules,roles,products,levels,learningPaths" then the fields in the order you want to use them.

If required by the moderators, we can create another Q&A post for the "Learn API" (if it exists) section/channel tags. Or see if training support can aid to get a list Training Support

$locale_lang_list = @(
           "ar-SA", "bg-BG", "bs-cyrl-BA", "bs-latn-BA", "ca-ES", "cs-CZ", "da-DK",
           "de-AT", "de-CH", "de-DE", "el-GR", "en-AU", "en-CA", "en-GB", "en-IE", "en-IN", "en-MY",
           "en-NZ", "en-SG", "en-US", "en-ZA", "es-ES", "es-MX", "et-EE", "eu-ES", "fi-FI", "fil-PH",
           "fr-BE", "fr-CA", "fr-CH", "fr-FR", "ga-IE", "gl-ES", "he-IL", "hi-IN", "hr-HR", "hu-HU",
           "id-ID", "is-IS", "it-CH", "it-IT", "ja-JP", "kk-KZ", "ko-KR", "lb-LU", "lt-LT", "lv-LV",
           "ms-MY", "mt-MT", "nb-NO", "nl-BE", "nl-NL", "pl-PL", "pt-BR", "pt-PT", "ro-RO", "ru-RU",
           "sk-SK", "sl-SI", "sr-cyrl-RS", "sr-latn-RS", "th-TH", "tr-TR", "uk-UA", "vi-VN", "zh-CN",
           "zh-HK", "zh-TW")

$ProgressPreference = 'SilentlyContinue'
$StartDate=(GET-DATE)
Write-Host Start: $StartDate`n

Write-Host "---"
foreach ($locale_lang in $locale_lang_list)
{
   Write-Host "$count_list of $($locale_lang_list.count) - Testing $locale_lang : " -NoNewline

   Invoke-RestMethod -Uri https://learn.microsoft.com/api/catalog?locale=$locale_lang&type=roles,modules,products,levels,learningPaths -OutFile test1.txt
   Invoke-RestMethod -Uri https://learn.microsoft.com/api/catalog?locale=$locale_lang&type=modules,roles,products,levels,learningPaths -OutFile test2.txt

   $count_list = $count_list + 1

   $IsSortSame = compare-object $(Get-Content test1.txt) $(Get-content test2.txt)
   if ($IsSortSame -eq $null)
   {
      Write-Host "No sort difference"
   }
   else
   {
      Write-Host "Sort difference"
                  Rename-Item test1.txt $locale_lang"_roles-modules.txt"
                  Rename-Item test2.txt $locale_lang"_modules-roles.txt"
   }
}
Write-Host "---"

$EndDate=(GET-DATE)
Write-Host `nEnd: $EndDate
NEW-TIMESPAN –Start $StartDate –End $EndDate

# Cleanup/Reset Env settings
$ProgressPreference = 'Continue'
Remove-Item test1.txt -Force
Remove-Item test2.txt -Force
MSFT-AN commented 1 year ago

Just an update on this. As it appears that the response back from https://learn.microsoft.com/api/catalog site doesn't always return the full data back (truncation) from filters. Some testing I can see that it happens even with other types of orders lists. It appears more to do with the web response back being malformed in multiple calls.

As a mitigation I've incorporated a wrapper class for retrying requests and error handling of JsonConvert.DeserializeObject.

See C# code below. Hope this helps. Perhaps others here will have the C# version with HTTPClient methods.

// retry.cs
namespace ATask
{
    public static class Retry
    {
        public static void Do(
            Action action,
            TimeSpan retryInterval,
            int maxAttemptCount = 3)
        {
            Do<object>(() =>
            {
                action();
                return null;
            }, retryInterval, maxAttemptCount);
        }

        public static T Do<T>(
            Func<T> action,
            TimeSpan retryInterval,
            int maxAttemptCount = 3)
        {
            var exceptions = new List<Exception>();

            for (int attempted = 0; attempted < maxAttemptCount; attempted++)
            {
                try
                {
                    if (attempted > 0)
                    {
                        Thread.Sleep(retryInterval);
                    }
                    return action();
                }
                catch (Exception ex)
                {
                    exceptions.Add(ex);
                }
            }
            throw new AggregateException(exceptions);
        }
    }
}

// learn_lang_cat.cs
using System;
using System.Net;
using Newtonsoft.Json;
using ATask;

// Set locale array
var locales = new List<string>
{
    "ar-SA", "bg-BG", "bs-cyrl-BA", "bs-latn-BA", "ca-ES", "cs-CZ", "da-DK",
    "de-AT", "de-CH", "de-DE", "el-GR", "en-AU", "en-CA", "en-GB", "en-IE", "en-IN", "en-MY",
    "en-NZ", "en-SG", "en-US", "en-ZA", "es-ES", "es-MX", "et-EE", "eu-ES", "fi-FI", "fil-PH",
    "fr-BE", "fr-CA", "fr-CH", "fr-FR", "ga-IE", "gl-ES", "he-IL", "hi-IN", "hr-HR", "hu-HU",
    "id-ID", "is-IS", "it-CH", "it-IT", "ja-JP", "kk-KZ", "ko-KR", "lb-LU", "lt-LT", "lv-LV",
    "ms-MY", "mt-MT", "nb-NO", "nl-BE", "nl-NL", "pl-PL", "pt-BR", "pt-PT", "ro-RO", "ru-RU",
    "sk-SK", "sl-SI", "sr-cyrl-RS", "sr-latn-RS", "th-TH", "tr-TR", "uk-UA", "vi-VN", "zh-CN",
    "zh-HK", "zh-TW"
};

// Set variables
var msLearnBaseAddress = "https://learn.microsoft.com/";
var baseMsLearnApiAddress = "api/catalog?locale=";
var ApiAddressScope = "&type=learningPaths,modules";
var out_filename = "Learn_Catalog.txt";

// Create an empty file
File.Create(out_filename).Dispose();

// Loop through all locale languages
foreach (var locale in locales)
{
    Console.WriteLine(locale);

    // Without retry calling
    //GetCatalog(locale, msLearnBaseAddress, baseMsLearnApiAddress, corruptedApiAddressScope, out_filename);

    // With retry calling
    Retry.Do(() => GetCatalog(locale, msLearnBaseAddress, baseMsLearnApiAddress, ApiAddressScope, out_filename), TimeSpan.FromSeconds(5), 5);
}

static void GetCatalog(string locale, string msLearnBaseAddress, string baseMsLearnApiAddress, string ApiAddressScope, string filename)
{
    var addressWithLocale = msLearnBaseAddress + baseMsLearnApiAddress + locale + ApiAddressScope;

    // Get data
    var client = new WebClient();
    client.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko");
    var response = client.DownloadString(addressWithLocale);
    var settings = new JsonSerializerSettings { Error = (se, ev) => { ev.ErrorContext.Handled = true; } };
    dynamic parsedJson = JsonConvert.DeserializeObject(response, settings);
    JsonConvert.SerializeObject(response, Formatting.Indented);        
    var reeee = Convert.ToString(response);

    // Write to output file
    using (StreamWriter w = File.AppendText(filename))
    {w.WriteLine(reeee);}
}
grzeskoo commented 1 year ago

Exactly as you write "It appears more to do with the web response back being malformed in multiple calls"

Thank you for the code and the temporary workaround for this problem - I also used the retry mechanism in my code, however, it is not always effective and sometimes repeated retrieval of data several times further results in an error and the data is incomplete.

Hence my question or request how we can report this further and how to investigate it further - because the operation of these endpoints is unstable. Of course, we can try to capture errors and retry, but we are not sure that this will completely solve the problem.

Is there an appropriate team where we can forward this further in order to dig into the problem and definitively solve it? The obvious problem is on the side of these endpoints.

grzeskoo commented 1 year ago

image Your solution, unfortunately, also does not get around this problem - in fact I am able to save the result to a file but further transformations are impossible due to a badly constructed string/json.

Response from WebClient/HttpClient is always 200 which causes the error not to be caught but the response itself is incorrect.

MSFT-AN commented 1 year ago

As mentioned, the web response back being malformed due to multiple calls. To report that you'd have to see who in training support can aid to get the right list Training Support

Hopefully in this thread others can aid you in the code for way to mitigate this via a response data validation. In that the output returned response can be truncated. The WebClient/HttpClient will be 200 as nothing really was "wrong" with the HTTPS response only the data is incorrect due malformed response back.

As mitigation You'll have to implement a validation of sorts for the variable reeee or in the error handling of JsonConvert.DeserializeObject.

Something along the lines of (Although if others know a better way free to mention it here):