CXuesong / WikiClientLibrary

/*🌻*/ Wiki Client Library is an asynchronous MediaWiki API client library targeting modern .NET platforms
https://github.com/CXuesong/WikiClientLibrary/wiki
Apache License 2.0
82 stars 16 forks source link

Basic Cargo extension API support #78

Closed CXuesong closed 3 years ago

CXuesong commented 3 years ago

Resolves dotnet/roslyn#77. See unit tests in CargoTests.cs for usage example. Note that the public API in WikiClientLibrary.Cargo.Linq may change dramatically in the future.

CXuesong commented 3 years ago

I think this PR can conclude here and we can start another one in 2021 so as to address the limitations. I've published the package on NuGet registry: CXuesong.MW.WikiClientLibrary.Cargo v0.7.4-int.0 and you can try it out a bit.

BrunoBlanes commented 3 years ago

So, after testing it for a while I have a couple of questions:

  1. I couldn't get Where() or OrderBy() to work in my code, am I doing something wrong?
public async Task GetAsync()
{
    await wikiSite.Initialization;
    var context = new QueryContext(wikiSite);
    var query = context.RosterChanges
        .Where(x => x.Date_Sort > new DateTime(2020, 12, 11))
        .OrderBy(x => x.Date_Sort)
        .Select(x => new
        {
            x.Player,
            x.Date_Sort,
        }).Take(100);

    var response = await query.AsAsyncEnumerable().ToListAsync();
}

I noticed that results seem to be ordered by the first property I specified which was the player's name in this table and that results also seem to be repeating to fill in the specified limit. Also, this is a very memory intensive task, there's a lot of GC going on.

image

  1. Wouldn't it be wise to enforce a limit when none is specified? I hadn't set a limit at first so this method ran beyond my paciance.
CXuesong commented 3 years ago
  1. I couldn't get Where() or OrderBy() to work in my code, am I doing something wrong?

It's a bug. I've fixed it and published ~v0.7.4-int1 (with bug)~ v0.7.4-int2.

Also, this is a very memory intensive task, there's a lot of GC going on.

Yeah. I've not done profiling so it's very possible that it has significant memory footprint. I'll let it be for now until '21 😂

I hadn't set a limit at first so this method ran beyond my paciance.

Actually if you do not limit the size of result set in Entity Framework, you'll possible run into the same result. The point is, the public API let you treat the result set as a IEnumerable sequence that has unspecified length. When handling such sequence, it's your responsibility to make sure you are not iterating forever (or almost forever).

Under the hood, I'm currently paginating the result set per 10 records, so after every 10 records, WCL will ask server for the next batch. I'm planning to make the pagination size adjustable in the next PR.

In addition, I've introduced a sample project under \Samples\LinqToCargo folder just now. You can check it out. It also connects the logs to console so you can see what is going on when you are enumerating the sequence.

Here is some sample output

```log dbug: WikiClientLibrary.Sites.WikiSite[0] => Leaguepedia | League of Legends Esports Wiki => WikiSite{Leaguepedia | League of Legends Esports Wiki}::ExecuteCargoQueryAsync() Invoke Cargo query. Pseudo-query: SELECT T0.Player = Player,T0.Date_Sort = DateSort FROM RosterChanges = T0 WHERE (T0.Date_Sort > {dt'2020-12-11T00:00:00.0000000'}) ORDER BY T0.Date_Sort ASC OFFSET 0 FETCH 10 ROWS ONLY dbug: WikiClientLibrary.Sites.WikiSite[0] => Leaguepedia | League of Legends Esports Wiki => WikiSite{Leaguepedia | League of Legends Esports Wiki}::ExecuteCargoQueryAsync() Sending request 9B0814C900000003, SuppressAccountAssertion=False 1: { Player = Paz, DateSort = 2020/12/12 上午 12:00:00 } 2: { Player = Crash, DateSort = 2020/12/12 上午 12:00:00 } 3: { Player = Ramune, DateSort = 2020/12/12 上午 12:00:00 } 4: { Player = Gango, DateSort = 2020/12/12 上午 12:00:00 } 5: { Player = Helper (Kwon Yeong-jae), DateSort = 2020/12/12 上午 12:00:00 } 6: { Player = Winged, DateSort = 2020/12/12 上午 12:00:00 } 7: { Player = ScrappyDoo, DateSort = 2020/12/12 上午 12:00:00 } 8: { Player = Navy (Micha? Leszczyński), DateSort = 2020/12/12 上午 12:00:00 } 9: { Player = iBo, DateSort = 2020/12/12 上午 12:00:00 } 10: { Player = Mocha (Kim Tae-gyeom), DateSort = 2020/12/12 上午 12:00:00 } dbug: WikiClientLibrary.Sites.WikiSite[0] => Leaguepedia | League of Legends Esports Wiki => WikiSite{Leaguepedia | League of Legends Esports Wiki}::ExecuteCargoQueryAsync() Invoke Cargo query. Pseudo-query: SELECT T0.Player = Player,T0.Date_Sort = DateSort FROM RosterChanges = T0 WHERE (T0.Date_Sort > {dt'2020-12-11T00:00:00.0000000'}) ORDER BY T0.Date_Sort ASC OFFSET 10 FETCH 10 ROWS ONLY dbug: WikiClientLibrary.Sites.WikiSite[0] => Leaguepedia | League of Legends Esports Wiki => WikiSite{Leaguepedia | League of Legends Esports Wiki}::ExecuteCargoQueryAsync() Sending request 9B0814C900000004, SuppressAccountAssertion=False 11: { Player = Mocha (Kim Tae-gyeom), DateSort = 2020/12/12 上午 12:00:00 } 12: { Player = Leesang, DateSort = 2020/12/12 上午 12:00:00 } 13: { Player = Typiczny, DateSort = 2020/12/12 上午 12:00:00 } 14: { Player = Stachu, DateSort = 2020/12/12 上午 12:00:00 } 15: { Player = Leesang, DateSort = 2020/12/12 上午 12:00:00 } 16: { Player = Fikkle, DateSort = 2020/12/12 上午 12:00:00 } 17: { Player = Gienek, DateSort = 2020/12/12 上午 12:00:00 } 18: { Player = Kaspersky, DateSort = 2020/12/12 上午 12:00:00 } 19: { Player = Truklax, DateSort = 2020/12/12 上午 12:00:00 } 20: { Player = Goku, DateSort = 2020/12/12 上午 12:00:00 } dbug: WikiClientLibrary.Sites.WikiSite[0] => Leaguepedia | League of Legends Esports Wiki => WikiSite{Leaguepedia | League of Legends Esports Wiki}::ExecuteCargoQueryAsync() Invoke Cargo query. Pseudo-query: SELECT T0.Player = Player,T0.Date_Sort = DateSort FROM RosterChanges = T0 WHERE (T0.Date_Sort > {dt'2020-12-11T00:00:00.0000000'}) ORDER BY T0.Date_Sort ASC OFFSET 20 FETCH 10 ROWS ONLY dbug: WikiClientLibrary.Sites.WikiSite[0] => Leaguepedia | League of Legends Esports Wiki => WikiSite{Leaguepedia | League of Legends Esports Wiki}::ExecuteCargoQueryAsync() Sending request 9B0814C900000005, SuppressAccountAssertion=False 21: { Player = Goku, DateSort = 2020/12/12 上午 12:00:00 } 22: { Player = KenRuto, DateSort = 2020/12/13 上午 12:00:00 } 23: { Player = Levi (Besmir Jakupi), DateSort = 2020/12/13 上午 12:00:00 } 24: { Player = KenRuto, DateSort = 2020/12/13 上午 12:00:00 } 25: { Player = Fallen (Viktor Kordanovski), DateSort = 2020/12/13 上午 12:00:00 } 26: { Player = 2Cups, DateSort = 2020/12/13 上午 12:00:00 } 27: { Player = goldento4st, DateSort = 2020/12/13 上午 12:00:00 } 28: { Player = Levi (Besmir Jakupi), DateSort = 2020/12/13 上午 12:00:00 } 29: { Player = Spale, DateSort = 2020/12/13 上午 12:00:00 } 30: { Player = Andariel, DateSort = 2020/12/13 上午 12:00:00 } … ```
BrunoBlanes commented 3 years ago

Yep, working great so far. I'll make sure to test all the possible combinations of queries soon.

One last question for now though, is it possible for an overload of await site.ExecuteCargoQueryAsync() to take in an IQueriable<T> as parameter and return an IAsyncEnumerable<T>?

It is more of a code cleanup for the end user and would resemble more of EF Core's usage.

I guess it would only make sense if it were returning an IEnumerable<T>, because of the Async in the method call...

RheingoldRiver commented 3 years ago

Hey, for the LINQ support, Cargo has one special operator you can use in where, called HOLDS - documentation here, it's just a syntax sugar (and pretty buggy tbh, I try to avoid using it). I didn't look too closely at the PR so you might have added it already, but wanted to link this just in case!

CXuesong commented 3 years ago

It is more of a code cleanup for the end user and would resemble more of EF Core's usage.

@BrunoBlanes Actually ExecuteCargoQueryAsync is the counterpart of SqlCommand in ADO.NET. It executes a query (in string) and returns the result set. You need to start a IQueryable chained calls from something like an EF DbSet<T>, i.e. ICargoRecordSet in WCL.

And frankly speaking, the async extension method support for IQueryable is not so straightforward in .NET.

  1. There is no IAsyncQueryable (dotnet/runtime#77698) in BCL. IQueryable implements IEnumerable but (perhaps due to compatibility) not IAsyncEnumerable, so you need to restrain yourself from using any of the sync extension method exposed from IEnumerable like this

    context.Skins.First()    // Enumerable.First(IEnumerable<T>)
    context.Skins.Take(10).ToArray()    // Enumerable.ToArray(IEnumerable<T>)

    because we don't want synchronous blocking call in async functions.

  2. Because there is no BCL extension method support for IAsyncEnumerable (There is no IAsyncEnumerable<T>.ToArrayAsync in .NET 5!), WikiClientLibrary is referencing System.Linq.Async (aka. Ix.Async) NuGet package. It's not part of BCL, and if you are consuming this package (via WikiClinetLibrary) and EF at the same time, you need to be careful not to fall into dotnet/efcore#18124 .

Though it may sound a bit frustrating, but I'm not implementing async extension methods like IQueryable.ToArrayAsync or IAsyncEnumerable.ToArrayAsync, because

  1. These methods are implemented in efcore. For now I don't want to repeat these extension methods in WCL.
  2. Implementing these extension methods in WCL may dig more name conflict pitfalls like https://github.com/dotnet/efcore/issues/18124#issuecomment-734893944.

This means you will have to use something like this if you want to call some Linq collection (i.e. "collecting") methods

cargoContext.Skins
    .Take(10)    // IQueryable, no access to ToListAsync
    .AsAsyncEnumerable()    // This function is provided by WikiClientLibrary. It casts the IQueryable to IAsyncEnumerable
                            // Do not confuse it with `ToAsyncEnumerable`!
    .ToListAsync()    // Calls the collection method from Ix.Async

If you are also referencing EF Core library, then I suppose the following usage should also do, but I have not ever tried it before.

cargoContext.Skins
    .Take(10)    // IQueryable
    .ToListAsync()    // Calls the collection method provided by EF Core
CXuesong commented 3 years ago

Hey, for the LINQ support, Cargo has one special operator you can use in where, called HOLDS

@RheingoldRiver Yeah I have that in mind. I think I can implement some stub method like EF.Functions.Like in the next PR to make query builder build such query expression.