cambialens / lens-api-doc

10 stars 6 forks source link

Different value returned by Patent API than website query Patent Search #33

Closed bobchatham closed 3 years ago

bobchatham commented 3 years ago

Submitting a website query for a single patent with Lens ID 019-214-281-547-268 returns the correct result for a US patent application with the title Instrument Kit Tracking System.

Submitting a query for a single patent through the Patent API using the following query:

{"query":{"terms":{"lens_id":["019-214-281-547-268"]}},"size":100,"include":["claims"],"scroll":"1m","scroll_id":""}

returns a 200 response code with a payload containing the correct claims data but a different (and I believe wrong) Lens ID:

"scroll_id":"<long scroll id removed>","total":1,"data":[{"lens_id":"086-392-156-645-872","claims":[{"claims":[{"claim_text":["1. A tracking system for a medical instrument kit, comprising: a housing; and an electronics module contained within the housing, wherein the tracking system is configured and arranged to withstand external temperatures between 120-135° C.; wherein the tracking system is configured and arranged to be affixed to the medical instrument kit, the medical instrument kit is configured and arranged to receive at least one medical instrument; and wherein, the internal temperature of the electronics module is maintained such that an electronic device contained therein is operable when the external temperature is between 120-135° C."]},...

Submitting a website patent query for Lens ID 086-392-156-645-872 (the API-returned value) returns an error.

I have validated my API query code is working for other Lens IDs and for the case of passing in a single Lens ID. When I pass an array of 4 Lens IDs to my API query code to get claims text with 3 known good results and the fourth problem Lens ID, the result for Lens ID 019-214-281-547-268 is tagged with the wrong Lens ID, 086-392-156-645-872, so I don't pick it up in the code processing the returned query results.

Note: My observation is that if I pass a query to the API for claims data with a list of Lens IDs:

{"query":{"terms":{"lens_id":["053-482-898-165-714","066-445-236-190-864","043-561-754-646-116","019-214-281-547-268"]}},"size":100,"include":["claims"],"scroll":"1m","scroll_id":""}

my assumption (and experience) is that I will not get the API query results back in the same order in which they were passed to the API via the query list. Hence, I build a dictionary of the returned results and then use that to reorder the results to match the query order in my caller. Please advise if there's a better way to handle this.

AaronBallagh commented 3 years ago

Hi Bob,

The patent API is based on a new patent architecture which includes additional data sources and metadata compared to the UI, so API queries may give different results to similar queries run on the Lens.org platform. The new patent architecture is currently being integrated into the UI and will be released soon.

In the new architecture we take a "Meta Record" approach whereby we merge multiple data sources into each patent record. Lens Ids assigned to records in the data sources are combined when merging them into the single record and the earliest Lens Id is selected as the primary Lens Id for the record. The current UI only uses one of the Lens Ids which may not match the primary Lens Id in the new system, so in those cases it doesn't come back in the API response record. We'll look into a solution for the result ordering and perhaps return all of the record's Lens Ids in the response as well.

Cheers,

Aaron

bobchatham commented 3 years ago

Aaron, thanks for the explanation. Maybe you could solve with a helper API endpoint that given an array of random Lens Ids (primary or not), returns a cleaned-up array of all corresponding primary Lens Ids.