apache / lucenenet

Apache Lucene.NET
https://lucenenet.apache.org/
Apache License 2.0
2.24k stars 639 forks source link

EncodingProvider won't load when target is .NET Standard 2.0 and runtime is .NET Framework #1025

Closed NightOwl888 closed 3 days ago

NightOwl888 commented 1 week ago

Is there an existing issue for this?

Describe the bug

In Morofologik.Stemming, we encountered a problem with testing netstandard2.0 on net471 - it fails to load System.Text.Encoding.CodePages.dll and crashes. This no doubt also affects the following modules:

Note that Hunspell in Lucene.Net.Analysis.Common also requires System.Text.Encoding.CodePages when loading some dictionaries, but users are expected to add a reference to their project if, and only if, they require it.

Expected Behavior

.NET Framework should be able to use a netstandard2.0 assembly without receiving an error message.

Steps To Reproduce

This occurred when we upgraded Morfologik.Stemming to net9.0 and also added targets for net8.0 and net9.0, thus requiring us to test netstandard2.0 on something else. We chose net471 and encountered this problem. It isn't clear why we are not seeing this in Lucene.Net, but we definitely should be checking the runtime before registering an encoding provider and we are currently not.

Exceptions (if any)

System.IO.FileNotFoundException : Could not load file or assembly 'System.Text.Encoding.CodePages, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.

Lucene.NET Version

4.8.0-beta00017

.NET Version

.NET Framework (the version we test netstandard2.0 with)

Operating System

N/A

Anything else?

This is happening because .NET Framework doesn't require this registration, however our conditional compilation only checks whether the target framework supports FEATURE_ENCODINGPROVIDERS, it does not check the actual runtime being used. In Morfologik.Stemming, this was addressed using the following class, which is called from static constructors on all of the types that require the encoding.

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading;

namespace Morfologik.Stemming.Support
{
    /// <summary>
    /// Loads the <see cref="System.Text.EncodingProvider"/> for the current runtime for support of
    /// iso-8859-1 encoding.
    /// </summary>
    internal static class EncodingProviderInitializer
    {
        private static int initialized;

        private static bool IsNetFramework =>
#if NETSTANDARD2_0
            RuntimeInformation.FrameworkDescription.StartsWith(".NET Framework", StringComparison.OrdinalIgnoreCase);
#elif NET40_OR_GREATER
            true;
#else
            false;
#endif

        [Conditional("FEATURE_ENCODINGPROVIDERS")]
        public static void EnsureInitialized()
        {
            // Only allow a single thread to call this
            if (0 != Interlocked.CompareExchange(ref initialized, 1, 0)) return;

#if FEATURE_ENCODINGPROVIDERS
            if (!IsNetFramework)
            {
                Initialize();
            }
#endif
        }

#if FEATURE_ENCODINGPROVIDERS
        // NOTE: CodePagesEncodingProvider.Instance loads early, so we need this in a separate method to ensure
        // that it isn't executed until after we know which runtime we are on.
        [MethodImpl(MethodImplOptions.NoInlining)]
        private static void Initialize()
        {
            // Support for iso-8859-1 encoding. See: https://docs.microsoft.com/en-us/dotnet/api/system.text.codepagesencodingprovider?view=netcore-2.0
            Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
        }
#endif
    }
}