apify / apify-storage-local-js

Local emulation of the apify-client NPM package, which enables local use of Apify SDK.
3 stars 4 forks source link

Type inaccuracies #48

Closed Zauberbutter closed 2 years ago

Zauberbutter commented 2 years ago

I think the following three functions calls have the wrong type for the request parameter:

async addRequest(request: RequestModel, options: RequestOptions = {}): Promise<QueueOperationInfo> {
  // …
}
async batchAddRequests(requests: RequestModel[], options: RequestOptions = {}): Promise<BatchAddRequestsResult> {
  // …
}
async updateRequest(request: RequestModel, options: RequestOptions = {}): Promise<QueueOperationInfo> {
  // …
}

RequestModel should be changed to RequestBody because these functions don't require the queue data parameters for the provided request argument, because they get calculated by this._createRequestModel later in the functions body.

B4nan commented 2 years ago

Thoughts @vladfrangu?

Btw @Zauberbutter why do you care? :] Those interfaces are compatible, RequestModel contains the exact same properties + few more optional ones (so those can't really affect anything).

Zauberbutter commented 2 years ago

@B4nan

Btw @Zauberbutter why do you care? :]

I want to update the requests I'm getting from enqueueLinks with updateRequest. Don't know if its a good idea but it works so far. But I don't get the queue parameters with either enqueueLinks or getRequest so TypeScript isn't happy about that.

My use case is, that I saving all pages which are pointing to a specific page. But until the request is handled, I don't have the specific page in my database, so I append the data to the request's user data. I can't do this in the transformRequest function, because there I don't have any information about whether the request was already handled (so its in my DB) or not. Well now as I write this it occurs to me that I could append the original page in transformRequest and just handle all handled requests afterwards…

B4nan commented 2 years ago

I want to update the requests I'm getting from enqueueLinks with updateRequest. Don't know if its a good idea but it works so far. But I don't get the queue parameters with either enqueueLinks or getRequest so TypeScript isn't happy about that.

Can you provide more details about this? How does the error look like? Because as said above, those interfaces are already compatible. What queue parameters are you talking about? queueId? That one is optional too.

Also, you are talking about enqueueLinks, which makes me think you are not having any actual problems with this particular package - I don't think apify SDK use types from storage-local directly. To me this sounds like you want something to be fixed, but your true problem is not the one you described in the OP.

Zauberbutter commented 2 years ago

Can you provide more details about this? How does the error look like? Because as said above, those interfaces are already compatible. What queue parameters are you talking about? queueId? That one is optional too.

I'm using the method updateRequest from the client interface of the RequestQueue. I get the following error when I'm supplying a request from either enqueueLinks or getRequest:

Argument of type '{ id: any; url: string; loadedUrl: any; uniqueKey: any; method: string; payload: string | Buffer | undefined; noRetry: any; retryCount: any; errorMessages: any[]; headers: { ...; }; handledAt: any; }' is not assignable to parameter of type 'RequestModel & RequestQueueClientRequestSchema'.

Also, you are talking about enqueueLinks, which makes me think you are not having any actual problems with this particular package - I don't think apify SDK use types from storage-local directly. To me this sounds like you want something to be fixed, but your true problem is not the one you described in the OP.

I'm not sure, the types are coming from @apify/storage-local/dist/resource_clients/request_queue.d.ts and are looking exactly like the types from this repository.

But now that I've found out I don't have to use updateRequest, my issue is purely informational.

B4nan commented 2 years ago

Please provide the full error, not just the first line. Also please provide reproduction for this, I really doubt what you are asking in the OP would help. That TS error is not saying anything about the additional properties from RequestModel (in the end, they are all optional, and TS is structural, so it does not care about it at all). It says the types are incompatible, and there should be more details underneath. Seeing your input parameter is full of anys, that's weird on its onw.

That error is saying the parameter is not compatible with RequestModel & RequestQueueClientRequestSchema - your issue is not the RequestModel part, but the latter - RequestQueueClientRequestSchema. And that is a type coming from apify-client, not apify-storage-local.

Are you working with the storage-local directly in your code, or thru the Apify SDK? What versions are you using? We really need to see code.

edit: I am quite sure your actual problem is the payload type, as your input type allows Buffer while RequestQueueClientRequestSchema don't.

Zauberbutter commented 2 years ago

Ok I tried to setup a minimal working example and now the error mentioned by me is gone. I deleted the package-lock in my other project and the types got updated there also. Sorry that I wasted your time, I could have done this earlier.

Aside from that there is now a new error, but its not so fundamental:

import Apify, {
    type CheerioCrawlerOptions,
    type CheerioHandlePageInputs,
} from 'apify';
const { utils } = Apify;

async function cheerioCrawlerSetup(){
    const requestQueue = await Apify.openRequestQueue();

    await requestQueue.addRequest({
        url: "https://www.example.com",
    });

    async function handlePage({
        $,
        request,
        response,
    }: CheerioHandlePageInputs) {
        const enqueuedRequests = await utils.enqueueLinks({
            $,
            selector: 'a[href]',
            requestQueue: requestQueue,
            baseUrl: request.loadedUrl as string,
        });

        for (const queueInfo of enqueuedRequests) {
            if (!queueInfo.wasAlreadyHandled) {
                const request = await requestQueue.getRequest(queueInfo.requestId);

                await requestQueue.client.updateRequest({
                    ...request,
                    userData: {
                        foo: 'bar',
                    },
/*
Argument of type '{ userData: { foo: string; }; id?: any; url?: string | undefined; loadedUrl?: any; uniqueKey?: any; method?: string | undefined; payload?: string | Buffer | undefined; noRetry?: any; retryCount?: any; errorMessages?: any[] | undefined; headers?: { ...; } | undefined; handledAt?: any; }' is not assignable to parameter of type 'RequestModel & RequestQueueClientRequestSchema'.
  Type '{ userData: { foo: string; }; id?: any; url?: string | undefined; loadedUrl?: any; uniqueKey?: any; method?: string | undefined; payload?: string | Buffer | undefined; noRetry?: any; retryCount?: any; errorMessages?: any[] | undefined; headers?: { ...; } | undefined; handledAt?: any; }' is not assignable to type 'RequestModel'.
    Types of property 'url' are incompatible.
      Type 'string | undefined' is not assignable to type 'string'.
        Type 'undefined' is not assignable to type 'string'.ts(2345)
*/
                });
            }
        }
    }

    const options: CheerioCrawlerOptions = {
        requestQueue: requestQueue,

        handlePageFunction: handlePage,
    };

    const crawler = new Apify.CheerioCrawler(options);

    await crawler.run();
}
package.json ``` { "name": "apify-test", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1" }, "keywords": [], "author": "", "license": "ISC", "dependencies": { "apify": "^2.3.2" }, "devDependencies": { "@types/node": "^17.0.36", "typescript": "^4.7.2" } } ```
B4nan commented 2 years ago

If that is all you want to do, you don't need the update at all, just use transformRequestFunction to add the userData right when you create it.

https://sdk.apify.com/docs/api/utils#utilsenqueuelinksoptions

B4nan commented 2 years ago

Note that this particular error is in your code, as getRequest() can return null and you are not handling that, so destructing request object will produce every property as optional, so TS is complaining about possibily missing url. If you fix that, you end up with method being wrongly typed as one of the storages uses const string union and the other expects just a string.

The TS support in v2 is not perfect, you will need casting here and there for sure - and this is the place where you can easily use it. But a new full TS rewrite called crawlee is behind the corner, we plan on releasing it to stable channels in a month, so things will get better soon :]

Zauberbutter commented 2 years ago

If that is all you want to do, you don't need the update at all, just use transformRequestFunction to add the userData right when you create it.

Yep I already figured that out in my second comment!

But a new full TS rewrite called crawlee is behind the corner, we plan on releasing it to stable channels in a month, so things will get better soon :]

That sounds awesome! You guys are truly amazing, I really like to work with the Apify SDK. Thank you for your time, I hope you forgive me for being a bit stumped.