CXuesong / WikiClientLibrary

/*🌻*/ Wiki Client Library is an asynchronous MediaWiki API client library targeting modern .NET platforms
https://github.com/CXuesong/WikiClientLibrary/wiki
Apache License 2.0
80 stars 16 forks source link

RecentChangesEnumerator not properly populating all results #90

Open tigerpaw28 opened 2 years ago

tigerpaw28 commented 2 years ago

I used the following code to get log entries via the RecentChangesEnumerator:

 var generator = new RecentChangesGenerator(wiki)
 {
           PaginationSize = 50,
           EndTime = DateTime.Parse("13:36, 30 October 2021"),
           TypeFilters = RecentChangesFilterTypes.Log
 };

 var items = await generator.EnumItemsAsync().ToListAsync();

This is results in two items being returned, but neither have any of their properties populated. Broader searches seem to return a mix of populated and unpopulated results. As usual I'm working with TFWiki.net which is on MW 1.19.

CXuesong commented 2 years ago

Your code eventually sends the following request to TFWiki

POST https://tfwiki.net/mediawiki/api.php

format=json&action=query&maxlag=5&list=recentchanges&rcdir=older&rcend=2021-10-30T05%3a36%3a00Z&rctype=log&rclimit=50&rcprop=user%7cuserid%7ccomment%7cparsedcomment%7cflags%7ctimestamp%7ctitle%7cids%7csizes%7credirect%7cloginfo%7ctags%7csha1

You can see the response by opening the following link https://tfwiki.net/mediawiki/api.php?format=json&action=query&maxlag=5&list=recentchanges&rcdir=older&rcend=2021-10-30T05%3a36%3a00Z&rctype=log&rclimit=50&rcprop=user%7cuserid%7ccomment%7cparsedcomment%7cflags%7ctimestamp%7ctitle%7cids%7csizes%7credirect%7cloginfo%7ctags%7csha1 TFWiki responds the request with

{
    "warnings": {
        "recentchanges": {
            "*": "Unrecognized value for parameter 'rcprop': sha1"
        }
    },
    "query": {
        "recentchanges": [
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            },
            {
                "tags": []
            }
        ]
    }
}

This response is abnormal. especially, there is no other fields except tags. I think there must be something wrong with MediaWiki server code to send you such response.

Actually, if you remove the |tag part from rcprop parameter, you will see empty response, as expected: https://tfwiki.net/mediawiki/api.php?format=json&action=query&maxlag=5&list=recentchanges&rcdir=older&rcend=2021-10-30T05%3a36%3a00Z&rctype=log&rclimit=50&rcprop=user%7cuserid%7ccomment%7cparsedcomment%7cflags%7ctimestamp%7ctitle%7cids%7csizes%7credirect%7cloginfo%7csha1

{"warnings":{"recentchanges":{"*":"Unrecognized value for parameter 'rcprop': sha1"}},"query":{"recentchanges":[]}}
CXuesong commented 2 years ago

So what you can do here is

  1. Find out why MediaWiki is sending such response (there could be some bug with MW 1.19 software).
  2. Regardless of whether you are planning to do 1., you can derive your own class from RecentChangesGenerator, override EnumParams method, so that you can later intercept the rvprop parameter and remove the |tag part.
    private IEnumerable<KeyValuePair<string, object?>> EnumParams(bool isList)
    => base.EnumParams(isList).Select(p => p.Key == "rvprop" ? new KeyValuePair<string, object?>(p.Key, ((string)p.Value).Replace("|tags", "")) : p);
CXuesong commented 2 years ago

To furtherly prove this, try the code below: .NET Fiddle

using System;
using System.Linq;
using WikiClientLibrary.Client;
using WikiClientLibrary.Sites;
using WikiClientLibrary.Generators;

using var client = new WikiClient();
var site = new WikiSite(client, "https://tfwiki.net/mediawiki/api.php");
await site.Initialization;

Console.WriteLine(site.SiteInfo + " " + site.SiteInfo.Version);

var generator = new RecentChangesGenerator(site)
{
    PaginationSize = 50,
    EndTime = DateTime.Parse("13:36, 30 October 2021"),
    TypeFilters = RecentChangesFilterTypes.Log
};

Console.WriteLine("Server side log filtering");
var items = await generator.EnumItemsAsync().ToListAsync();
Console.WriteLine("{0} items:", items.Count);
foreach (var i in items) Console.WriteLine(i);

Console.WriteLine("Client side log filtering");
generator.TypeFilters = RecentChangesFilterTypes.All;
items = await generator.EnumItemsAsync().Where(i => i.Type == RecentChangesType.Log).ToListAsync();
Console.WriteLine("{0} items:", items.Count);
foreach (var i in items) Console.WriteLine(i);

The output is

WikiClientLibrary.Sites.SiteInfo 1.19.20
Server side log filtering
13 items:
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
0,01/01/0001 00:00:00,Edit,[None],,,
Client side log filtering
0 items:

It seems that the issue won't manifest if you are listing everything instead of listing logs on the server-side.

tigerpaw28 commented 2 years ago

It seems to be specifically the sha tag (or maybe tags in general) being incompatible with the API. This query returns results with log filtering:

tigerpaw28 commented 2 years ago

And now I see this isn't even what I want to query since recent changes doesn't appear to include the user creation log entries, despite those log events appearing on the recent changes page.

I don't see a generator for log events so I'm guessing I need to write my own generator and/or use InvokeMediaWikiApiAsync to query that list. Would the same apply to retrieving the allusers list as well?

tigerpaw28 commented 2 years ago

Yet further API testing shows that I can get user creation logs from the RecentChanges API so long as you don't ask it to populate loginfo. This causes it to filter out some types of logs, presumably because they don't have those fields.

With this in mind, I was going to adopt your suggestion of deriving a class from RecentChangesGenerator and override EnumParams...except that EnumParams is a private method and can't be overriden.

Would you be open to changing that or do you have another suggestion? The question about retrieving users still stands as well.