cerbero90 / lazy-json-pages

📜 Framework-agnostic API scraper to load items from any paginated JSON API into a Laravel lazy collection via async HTTP requests.
MIT License
178 stars 2 forks source link

Only the first HTTP request is logged in Telescope #1

Closed sebastiaanluca closed 2 months ago

sebastiaanluca commented 3 years ago

Detailed description

When debugging the HTTP requests made using Telescope or Ray, only the first request is actually logged. Any subsequent calls to the API to retrieve more pages do not appear in the logs.

Context

Why is this change important to you? How would you use it?

Would be very useful to know how many requests are actually made and to see their information to able to debug issues.

Possible implementation

All requests are logged.

Though not sure how to enable this. I digged into the code and saw the package wraps the original call in a SourceWrapper. So it's possible the package checks the result of that call (since it's already being sent as we called e.g. ->get()), but it doesn't make requests using the built-in client?

Your environment

Include as many relevant details about the environment you experienced the bug in and how to reproduce it.

PHP 8 Laravel 8.60 v1 of this package macOS Laravel Valet

sebastiaanluca commented 3 years ago

Might be related, not sure:

If I do something like this and fetch 100 items per page (hard limit for this API):

LazyCollection::fromJsonPages(
            $source,
            $dataKey,
            static fn (Config $config): Config => $config
                ->items('total_entries')
                ->pages('total_pages')
                ->lastPage('links.last')
                ->perPage(100, 'per_page')
                ->concurrency(10)
                ->timeout(15)
                ->attempts(2)
                ->backoff(static fn (int $attempt): int => $attempt ** 2 * 1000),
        );

And then chunk that lazy collection:

        $timeEntries
            ->map(static function (TimeEntryDto $timeEntryDto): array {
                return $timeEntryDto;
            })
            ->chunk(101)
            ->each(static function (LazyCollection $chunk): void {
                dump('hello');
            });

It'll only dump hello once, no matter how many chunks or items are in the collection. Any integer above the per_page limit results in it looping only once. Anything equal or below and it chunks fine. The total count is correct however (thousands of items in this case).

cerbero90 commented 3 years ago

Hi @sebastiaanluca and thanks for your report.

The reason why only the first HTTP request is logged is because Lazy JSON Pages is framework agnostic and doesn't use the Laravel HTTP client to make the subsequent HTTP calls. The only integration with the Laravel framework is its support for the Laravel HTTP client responses.

Regarding the issue with chunk(), I think that you see the dump of hello only once because chunk() groups all items in one single chunk if the number provided is bigger than the number of items.

For example the code below:

LazyCollection::times(5)->chunk(10)->toArray();

will generate:

[
    [
        1,
        2,
        3,
        4,
        5,
    ],
]
sebastiaanluca commented 3 years ago

Thanks for the fast reply!

The reason why only the first HTTP request is logged is because Lazy JSON Pages is framework agnostic and doesn't use the Laravel HTTP client to make the subsequent HTTP calls. The only integration with the Laravel framework is its support for the Laravel HTTP client responses.

Does it internally use Guzzle and if so, would it be possible to inject Guzzle middleware into its stack? Example to log Guzzle requests in Telescope: https://gist.github.com/barryvdh/a04ab1628558d6ac41a036a238ee090b

Regarding the issue with chunk(), I think that you see the dump of hello only once because chunk() groups all items in one single chunk if the number provided is bigger than the number of items.

I have a collection like this, fetched from an API (30 pages, 100 items max each):

LazyCollection::times(3000)->chunk(100)->each(…)->toArray();

The above works (it loops through all created chunks). The following does not (it only loops through one chunk):

LazyCollection::times(3000)->chunk(101)->each(…)->toArray();

The amount of time it takes for both of these snippets to run is almost the same, so something tells me it loops through all chunks regardless, but doesn't execute the callback more than once when the amount to chunk exceeds the maximum items it can fetch from the API (second snippet).

sebastiaanluca commented 3 years ago

I've tried to recreate it in a sandbox: https://phpsandbox.io/e/x/irxrz?layout=EditorPreview&defaultPath=%2F&theme=dark&showExplorer=no&openedFiles=

So if you have 200 items and you chunk them into 101 items, it should create 2 chunks of 101 and 99 items, right? Trying to not confuse myself 😅

Route /default shows what's expected using the regular LazyCollection. This creates 2 chunks of 101 and 99 items. Route /lazy shows what happens when you use it in combination with a paginated API. This creates one chunk of 100 items. Route /working does the same with a paginated API, but correctly chunks into 2 100 item chunks. This is only the case if the chunk amount is the same or less as the paginated amount.

cerbero90 commented 3 years ago

Thanks for your time to demo the issue, @sebastiaanluca, now I can see what is the problem :)

At the moment I'm not sure about what could be the cause, it needs some investigation 👍

Regarding the logging, thank you for the link to the Guzzle handler. We can definitely keep the package framework agnostic while letting the user define the Guzzle handler with a Config option.

I'm thinking of something like this:

$config->guzzleHandler(function () {
    $stack = HandlerStack::create();
    $stack->push(new LaravelEventsMiddleware());

    return $stack;
});

Lazy JSON Pages will provide its own middleware (very similar to Barry's) and will enable it by default when used in a Laravel application, while leaving the freedom to the users to disable it:

$config->muteLaravelEvents();
cerbero90 commented 2 months ago

@michabbb the Laravel HTTP client events are now supported in Lazy JSON Pages v2 🎉