CXuesong / WikiClientLibrary

/*🌻*/ Wiki Client Library is an asynchronous MediaWiki API client library targeting modern .NET platforms
https://github.com/CXuesong/WikiClientLibrary/wiki
Apache License 2.0
80 stars 16 forks source link

Problem getting list of "wanted" items when there are none #92

Closed rwv37 closed 2 years ago

rwv37 commented 2 years ago

I'm trying to get Wantedtemplates, Wantedpages, etc. It's working fine for me for some of them, but an exception is being thrown by others (from within WikiClientLibrary). I believe I have narrowed it down to "works fine" = "there are some such wanted things" and "exception thrown" = "there are no such wanted things".

I'm not sure if this is because I'm doing something wrong or if it's perhaps a bug in WCL or in something at a deeper level than that. If it's not simply because I'm doing something wrong, I suspect it might have something to do with the issue outlined here: Apparently "HasValues" should be called on a Newtonsoft object before trying to access a child object. Disclaimer: I know nothing about Newtonsoft beyond what it is and that it's popular.

Am I doing something wrong? If not, is there a workaround for this? Any help would be appreciated.

I am getting the error when I try to do an "await foreach" on the items, and also if I simply try to get the count of them.

Here's the part of my code that's failing:

    private static async Task AddToDictionaryAsync
        (Dictionary<string, WikiPage> dictionary, 
        Func<CancellationToken, Task<IAsyncEnumerable<WikiPage>>> itemsToAdd,
        CancellationToken cancellationToken)
    {
        var items = await itemsToAdd(cancellationToken).ConfigureAwait(false);

        // EXCEPTION IS THROWN HERE
        var count = await items.CountAsync().ConfigureAwait(false);

        Console.WriteLine($"{count} items");

        // EXCEPTION IS THROWN HERE (if I comment out the "CountAsync" call above)
        await foreach (var item in items.WithCancellation(cancellationToken).ConfigureAwait(false))
        {
            dictionary[item.Title] = item;
        }
    }

And here's the exception (in the "CountAsync" case):

fail: Microsoft.Extensions.Hosting.Internal.Host[9]
      BackgroundService failed
      System.InvalidOperationException: Cannot access child value on Newtonsoft.Json.Linq.JValue.
         at Newtonsoft.Json.Linq.JToken.get_Item(Object key)
         at WikiClientLibrary.Pages.WikiPage.<>c.<FromJsonQueryResult>b__0_0(JProperty page)
         at System.Linq.EnumerableSorter`2.ComputeKeys(TElement[] elements, Int32 count)
         at System.Linq.EnumerableSorter`1.ComputeMap(TElement[] elements, Int32 count)
         at System.Linq.EnumerableSorter`1.Sort(TElement[] elements, Int32 count)
         at System.Linq.OrderedEnumerable`1.GetEnumerator()+MoveNext()
         at System.Linq.Enumerable.SelectIPartitionIterator`2.ToList()
         at WikiClientLibrary.Pages.WikiPage.FromJsonQueryResult(WikiSite site, JObject jpages, IWikiPageQueryProvider options)
         at WikiClientLibrary.Generators.Primitive.WikiPageGenerator`1.<>c__DisplayClass8_0.<EnumPagesAsync>b__1(JObject jquery)
         at System.Linq.AsyncEnumerable.SelectManyAsyncIterator`2.MoveNextCore()
         at System.Linq.AsyncIteratorBase`1.MoveNextAsync() in d:\a\1\s\Ix.NET\Source\System.Linq.Async\System\Linq\AsyncIterator.cs:line 70
         at System.Linq.AsyncIteratorBase`1.MoveNextAsync() in d:\a\1\s\Ix.NET\Source\System.Linq.Async\System\Linq\AsyncIterator.cs:line 75
         at System.Linq.AsyncEnumerablePartition`1.SkipAndCountAsync(UInt32 index, IAsyncEnumerator`1 en) in d:\a\1\s\Ix.NET\Source\System.Linq.Async\System\Linq\AsyncEnumerablePartition.cs:line 377
         at System.Linq.AsyncEnumerablePartition`1.<>c__DisplayClass11_0.<<GetCountAsync>g__Core|0>d.MoveNext() in d:\a\1\s\Ix.NET\Source\System.Linq.Async\System\Linq\AsyncEnumerablePartition.cs:line 95
      --- End of stack trace from previous location ---
         at System.Linq.AsyncEnumerablePartition`1.<>c__DisplayClass11_0.<<GetCountAsync>g__Core|0>d.MoveNext() in d:\a\1\s\Ix.NET\Source\System.Linq.Async\System\Linq\AsyncEnumerablePartition.cs:line 101
      --- End of stack trace from previous location ---
         at Rwv37.MediaWiki.Api.WikiSiteExtensions.AddToDictionaryAsync(Dictionary`2 dictionary, Func`2 itemsToAdd, CancellationToken cancellationToken) in C:\Users\bob\Bob\trunk\Dev\DotNet\Rwv37\MediaWiki\Rwv37.MediaWiki.Api\WikiSiteExtensions.cs:line 112
         at Rwv37.MediaWiki.Api.WikiSiteExtensions.GetAllWantedAsync(WikiSite site, CancellationToken cancellationToken) in C:\Users\bob\Bob\trunk\Dev\DotNet\Rwv37\MediaWiki\Rwv37.MediaWiki.Api\WikiSiteExtensions.cs:line 38
         at Rwv37.MediaWiki.SiteSetup.SiteSetupService.DoThatFunkyThingAsync(CancellationToken stoppingToken) in C:\Users\bob\Bob\trunk\Dev\DotNet\Rwv37\MediaWiki\Rwv37.MediaWiki.SiteSetup\SiteSetupService.cs:line 40
         at Rwv37.MediaWiki.SiteSetup.SiteSetupService.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\bob\Bob\trunk\Dev\DotNet\Rwv37\MediaWiki\Rwv37.MediaWiki.SiteSetup\SiteSetupService.cs:line 30
         at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)
rwv37 commented 2 years ago

If not, is there a workaround for this?

I can just catch the System.InvalidOperationException and return, but I mean something more natural.

CXuesong commented 2 years ago

I think this error occurs because page.Value is a JValue on L31.

https://github.com/CXuesong/WikiClientLibrary/blob/243ae2c085e1a6b1a2c3680d4c7f7b9ef0a30f4f/WikiClientLibrary/Pages/PageFactory.cs#L18-L38

I'm not sure how it happens, but perhaps can you share me the code you've used to return the IAsyncEnumerable<WikiPage> in itemsToAdd delegate? Which wiki site are you querying against? Is it a public wiki?

rwv37 commented 2 years ago

Sorry for the late reply. Unfortunately, the exact code that was giving this exact error is gone. However, I later worked around a seemingly similar error in the same way as I previously worked around the earlier error, and I can show you the new one:

Differences between then and now

Difference in behavior

The difference in behavior is that whereas in the previous case, an exception was being thrown, in this case, it behaves as if the await foreach on the IAsyncEnumerable never finishes. To be clear, just like in the original report, the buggy behavior happens if and only if there are no pages to be returned.

Difference in code

I don't know exactly what the difference in my code is that led to this change in behavior, but I suspect that it's this: Previously (when the problem was that an exception was being thrown), I was doing something like...

// generator is a WikiClientLibrary.Generators.QueryPageGenerator
return generator.EnumPagesAsync().Take(100);

... then doing an await on that return value, and then an await foreach on the awaited return. Whereas the new problem is occurring while I am instead doing:

// generator is still a WikiClientLibrary.Generators.QueryPageGenerator
var pages = new KludgyWikiPageEnumerable(generator.EnumPagesAsync());
await foreach (var page in pages.ConfigureAwait(false).WithCancellation(cancellationToken))
{
   yield return page;
}

... and then doing an await foreach directly on the awaited return. So I'm guessing that maybe the change in behavior comes down to using Take() versus using yield return. I don't know, though.

Workaround

I worked around both this error and the previous one by creating the following two classes:

Class KludgyWikiPageEnumerable

using System.Collections.Generic;
using System.Threading;
using WikiClientLibrary.Pages;

namespace Rwv37.MediaWiki.Api
{
    /// <summary>
    /// <para>
    /// A kludgy class to deal with an apparent issue in WikiClientLibrary
    /// that happens when you attempt to retrieve a list that has no members
    /// (like, you try to get the items that are on Special:WantedFiles, but
    /// there are no such items).
    /// </para><para>
    /// See: <a href="https://github.com/CXuesong/WikiClientLibrary/issues/92">Problem getting list of "wanted" items when there are none</a>
    /// </para>
    /// </summary>
    internal class KludgyWikiPageEnumerable : IAsyncEnumerable<WikiPage>
    {
        private IAsyncEnumerable<WikiPage> WikiPages { get; init; }

        /// <summary>
        /// Initializes a new instance of the <see cref="T:Rwv37.MediaWiki.Api.KludgyWikiPageEnumerable" /> class.
        /// </summary>
        /// <param name="wikiPages">
        /// The wiki pages.
        /// </param>
        internal KludgyWikiPageEnumerable(IAsyncEnumerable<WikiPage> wikiPages)
        {
            this.WikiPages = wikiPages;
        }

        /// <summary>
        /// Returns an enumerator that iterates asynchronously through the collection.
        /// </summary>
        /// <param name="cancellationToken">
        /// A <see cref="T:System.Threading.CancellationToken">CancellationToken</see>
        /// that may be used to cancel the asynchronous iteration.
        /// </param>
        /// <returns>
        /// An enumerator that can be used to iterate asynchronously through the collection.
        /// </returns>
        public IAsyncEnumerator<WikiPage> GetAsyncEnumerator(
            CancellationToken cancellationToken = default)
        {
            return new KludgyWikiPageEnumerator(
                this.WikiPages.GetAsyncEnumerator(cancellationToken),
                this.WikiPages.GetAsyncEnumerator(cancellationToken));
        }
    }
}

Class KludgyWikiPageEnumerator

using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using WikiClientLibrary.Pages;

namespace Rwv37.MediaWiki.Api
{
    /// <summary>
    /// <para>
    /// A kludgy class to deal with an apparent issue in WikiClientLibrary
    /// that happens when you attempt to retrieve a list that has no members
    /// (like, you try to get the items that are on Special:WantedFiles, but
    /// there are no such items).
    /// </para><para>
    /// See: <a href="https://github.com/CXuesong/WikiClientLibrary/issues/92">Problem getting list of "wanted" items when there are none</a>
    /// </para>
    /// </summary>
    internal class KludgyWikiPageEnumerator : IAsyncEnumerator<WikiPage>
    {
        private IAsyncEnumerator<WikiPage> UnderlyingReal { get; init; }
        private IAsyncEnumerator<WikiPage> UnderlyingTester { get; init; }
        private bool WackinessTested { get; set; } = false;
        private bool IsWacky { get; set; } = false;

        /// <summary>
        /// Initializes a new instance of the <see cref="T:Rwv37.MediaWiki.Api.KludgyWikiPageEnumerator" /> class.
        /// </summary>
        /// <param name="underlyingReal">
        /// The underlying enumerator to "really" use.
        /// </param>
        /// <param name="underlyingTester">
        /// The underlying enumerator to use to test if we're gonna have a problem.
        /// </param>
        internal KludgyWikiPageEnumerator(IAsyncEnumerator<WikiPage> underlyingReal,
            IAsyncEnumerator<WikiPage> underlyingTester)
        {
            this.UnderlyingReal = underlyingReal;
            this.UnderlyingTester = underlyingTester;
        }

        private async ValueTask<bool> TestWackinessAsync(CancellationToken cancellationToken = default)
        {
            try
            {
                if (!this.WackinessTested)
                {
                    _ = await this.UnderlyingTester.MoveNextAsync(cancellationToken).ConfigureAwait(false);
                    this.IsWacky = false;
                }
            }
            catch (InvalidOperationException)
            {
                this.IsWacky = true;
            }
            finally
            {
                this.WackinessTested = true;
            }

            return this.IsWacky;
        }

        /// <summary>
        /// Gets the element in the collection at the current position of the enumerator.
        /// </summary>
        public WikiPage Current
        {
            get
            {
                return this.UnderlyingReal.Current;
            }
        }

        /// <summary>
        /// Dispose as an asynchronous operation.
        /// </summary>
        /// <returns>
        /// A Task&lt;ValueTask&gt; representing the asynchronous operation.
        /// </returns>
        [System.Diagnostics.CodeAnalysis.SuppressMessage(
            "IDisposableAnalyzers.Correctness",
            "IDISP007:Don't dispose injected",
            Justification = "Uh... I think this is cool? Errr... or \"hope\", at least?")]
        public async ValueTask DisposeAsync()
        {
            await this.UnderlyingReal.DisposeAsync().ConfigureAwait(false);
            await this.UnderlyingTester.DisposeAsync().ConfigureAwait(false);
        }

        /// <summary>
        /// Move next as an asynchronous operation.
        /// </summary>
        /// <returns>
        /// A Task&lt;System.Boolean&gt; representing the asynchronous operation.
        /// </returns>
        public async ValueTask<bool> MoveNextAsync()
        {
            if (!this.WackinessTested)
            {
                _ = await this.TestWackinessAsync().ConfigureAwait(false);
            }

            return !this.IsWacky && await this.UnderlyingReal.MoveNextAsync().ConfigureAwait(false);
        }
    }
}

Usage

Usage is just wrapping the IAsyncEnumerable returned by WikiClientLibrary in a KludgyWikiPageEnumerable. For example, the following code works fine, but before I wrapped the generator.EnumPagesAsync() within a new KludgyWikiPageEnumerable(), my program was acting as if the await foreach on the IAsyncEnumerable never completed:

public async IAsyncEnumerable<WikiPage> QueryAsync(
    int paginationSize = 100,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    // TODO: Think about pagination size

    var generator = new QueryPageGenerator(this.Site, this.QueryPageName)
    {
        PaginationSize = paginationSize,
    };

    var pages = new KludgyWikiPageEnumerable(generator.EnumPagesAsync());
    await foreach (var page in pages.ConfigureAwait(false).WithCancellation(cancellationToken))
    {
        yield return page;
    }
}
CXuesong commented 2 years ago

Actually, QueryPageGenerator.EnumItemsAsync haven't ever been working before... Please try the latest release instead.

Released v0.8.0-int.6.