cambialens / lens-api-doc

10 stars 5 forks source link

Trouble using the "include" field #47

Closed zilch42 closed 2 years ago

zilch42 commented 2 years ago

Hi there,

I'm not having any success limiting the columns returned using the "include" field. This means that I have to download every column which for a large set of patents is a huge amount of data.

The following query returns a 400 error:

{
    "query": {
        "bool": {
            "must": [
                {
                    "terms": {
                        "lens_id": [
                            "150-506-464-347-086",
                            "089-744-818-392-604",
                            "107-433-773-744-882",
                            "161-777-856-080-134",
                            "182-283-251-489-844",
                            "048-418-903-199-808",
                            "048-721-214-075-449",
                            "115-342-293-268-334",
                            "022-009-123-795-923",
                            "032-877-796-510-155"
                        ]
                    }
                }
            ]
        }
    },
    "size": 500,
    "scroll": "1m",
    "include": [
        "invention_title",
        "legal_status",
        "biblio.priority_claims"
    ]
}

It doesn't seem to matter what columns I include. The above are the columns from the example in the swagger UI so it should be okay.

Whereas if I just remove the "include" field altogether, it runs fine:

{
    "query": {
        "bool": {
            "must": [
                {
                    "terms": {
                        "lens_id": [
                            "150-506-464-347-086",
                            "089-744-818-392-604",
                            "107-433-773-744-882",
                            "161-777-856-080-134",
                            "182-283-251-489-844",
                            "048-418-903-199-808",
                            "048-721-214-075-449",
                            "115-342-293-268-334",
                            "022-009-123-795-923",
                            "032-877-796-510-155"
                        ]
                    }
                }
            ]
        }
    },
    "size": 500,
    "scroll": "1m"
}

Am I doing it wrong or is there a bug?

rosharma9 commented 2 years ago

Looks like a typo in our swagger doc. It should be biblio.invention_title instead of just invention_title. Thats why you were getting 400:

{
    "reference": "unknown_fields",
    "message": "Unrecognized fields - [invention_title]",
    "code": 400
}

Thanks for bringing it up, we will fix the swagger doc example.

zilch42 commented 2 years ago

Thanks @rosharma9 that's working for me now.

What about this extended example? all of these requested columns are included in the columns that are returned when not using the "include" field.

{
    "query": {
        "bool": {
            "must": [
                {
                    "terms": {
                        "lens_id": [
                            "150-506-464-347-086",
                            "089-744-818-392-604",
                            "107-433-773-744-882",
                            "161-777-856-080-134",
                            "182-283-251-489-844",
                            "048-418-903-199-808",
                            "048-721-214-075-449",
                            "115-342-293-268-334",
                            "022-009-123-795-923",
                            "032-877-796-510-155"
                        ]
                    }
                }
            ]
        }
    },
    "size": 500,
    "scroll": "1m",
    "include": [
        "lens_id",
        "jurisdiction",
        "kind",
        "doc_number",
        "date_published",
        "biblio.priority_claims.earliest_claim.date",
        "biblio.application_reference.doc_number",
        "biblio.application_reference.date",
        "biblio.invention_title",
        "publication_type",
        "families.extended_family.members",
        "families.extended_family.size",
        "families.simple_family.members",
        "families.simple_family.size",
        "biblio.parties.applicants",
        "biblio.parties.inventors",
        "legal_status.patent_status",
        "biblio.classifications_ipcr.classifications",
        "biblio.classifications_cpc.classifications"
    ]
}

Also how do I access the "message" field that you showed above? I can't find it in what is returned when I get an error. That would be really helpful for debugging.

rosharma9 commented 2 years ago

I see. The fields families.simple_family.size and families.extended_family.size are being calculated on the fly to make it readily available for users. It is just a count of members list. Thats why you were getting 400. I would recommend not to use them in projection and count the size of members if you are integrating it. We will make sure to add them as projectable field in future release.

To debug you can use Postman or cUrl if you are testing and if you are integrating it in the code, I recommend to print or add a log if the response status is not 200. Python example from the documentation:

...
elif response.status_code != requests.codes.ok:
    print response.json()
...
zilch42 commented 2 years ago

Ah awesome thank you