cambialens / lens-api-doc

10 stars 6 forks source link

Unable to retrieve data for inventors.extracted_address #75

Open vanessagd15 opened 3 months ago

vanessagd15 commented 3 months ago

Hi. I am requesting patent data from the API using R, for inventors residing in GB. The data retrieved under biblio.parties.inventors do not include extracted_address.

AaronBallagh commented 3 months ago

hello, thank you for this question. The biblio.parties.inventors field is only returned if the inevntor has address infomration. For patents with GB inventors, there are only ~55K that have inventor address information, see https://link.lens.org/TFbmz9zigme. You can confirm the inventor address is being returned for GB inventors with address information using this query:

{
    "query": {
        "bool": {
            "must": [
                {
                    "wildcard": {
                        "inventor.address": "*"
                    }
                },
                {
                    "match": {
                        "jurisdiction": "GB"
                    }
                }
            ]
        }
    }
}
vanessagd15 commented 4 weeks ago

Hi Aaron. Thank you so much for your help. I am trying to retrieve the 19.2k patents (https://shorturl.at/7TpUY) using the following request in batches of 100. request1 <- '{ "query": { "bool": { "must": [ { "match" : { "legal_status.granted": true } }, { "term" : { "owner_all.country": "GB" } }, { "wildcard": { "owner_all.address": "" } }, {"wildcard": {"class_cpc.symbol": "Y02" } } ] } }, "from": 100, "size": 100 }' Replacing the "from": 100 using a loop that iterates over x and use gsub as follows. x <- seq(200, 19200, 100) request <- gsub("from\": 100",paste0("from\": ",i), request1) However, the code returns 400 when hitting 10000. Could you please help? Thanks.

AaronBallagh commented 4 weeks ago

Hello, thank you for this question. You will need to use cursor-based pagination to scroll through all records when the total number of results is >10,000.

vanessagd15 commented 3 weeks ago

Hi Aaron. Thank you so much. I have been able to put a code together using cursor-based pagination and code that replaces the scroll_id after each retrieval of 100 patents. However, it always breaks before completing the task (extracting 19.2k patents). After inspecting a bit the results, I noticed I am hitting $x-rate-limit-retry-after-seconds == "0" and status_code == 400. Do you have any suggestion about how to sort this issue? Alternatively, could I extract all the patents' lens_id within my search criteria and target only the records that I have not been able to extract? Thanks for your help.

AaronBallagh commented 3 weeks ago

Hi Vanessa, that's no problem. It sounds like you are hitting the API rate limit for requests per minute. You will need to add some rate limiting to your code, we have an example in Python available here https://docs.api.lens.org/samples-patent.html#cursor-based-pagination. Let me know if that allows you to finish running your code.

vanessagd15 commented 3 weeks ago

Hi Aaron. Thank you very much. Really useful resources. I keep hitting errors, unfortunately. Now it is: $_x_lens_error_reference == "58ddfded-2d94-4f10-a2f6-bac446bd69b5" ... kind of lost in here. Anything that you could share? I tried with different tokens but seems that I am getting to the same dead-end. Thanks once again.

AaronBallagh commented 3 weeks ago

Hi Vanessa, that's no worries. It looks like you have used your monthly API request quota. I have reset your quota now so you can continue using the API, but you will need to monitor your usage and maximise the number of records you are requesting if you find you are reaching the request quota. You can monitor your usage from the usage endpoint here [GET] https://api.lens.org/subscriptions/patent_api/usage