cambialens / lens-api-doc

10 stars 6 forks source link

Question: Filtering query on partial IPCs #61

Open poldham opened 1 year ago

poldham commented 1 year ago

Hi All,

Posting this question as it may come up with other people. I want to search the full texts of documents for a term e.g. 'plant' and then limit the results to certain ipcs at the subclass level (or similar). Working with Swagger the following will work with the full IPC.

{ "query": { "bool": { "should": [ { "match": { "title": "plant" } }, { "match": { "abstracts": "plant" } }, { "match": { "description": "plant" } }, { "match": { "claims": "plant" } } ], "filter": { "bool": { "should": [ { "term": { "class_ipcr.symbol": "A01H5/00" } }, { "term": { "class_ipcr.symbol": "C12N15/82" } } ] } } } } }

However, I was expecting to wildcard the IPC below the subclass level e.g. A01H or C12N as it would be insane to use the full codes. However, a single * does not work (nor do four **** or ?). I must be missing something and wonder if you have any ideas. Any help most gratefully received!

AaronBallagh commented 1 year ago

G'day Paul,

I hope you are well! Apologies for the delay getting back to you, this one went under the radar. You can use a wildcard query in the request to retrieve all the subclass IPC codes, for example:

{
    "query": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "title": "plant"
                                }
                            },
                            {
                                "match": {
                                    "abstract": "plant"
                                }
                            },
                            {
                                "match": {
                                    "description": "plant"
                                }
                            },
                            {
                                "match": {
                                    "claim": "plant"
                                }
                            }
                        ]
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "wildcard": {
                                    "class_ipcr.symbol": {
                                        "value": "A01H*"
                                    }
                                }
                            },
                            {
                                "wildcard": {
                                    "class_ipcr.symbol": {
                                        "value": "C12N*"
                                    }
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "size": 100
}

Let me know if that helps and apologies again for the delay.

Cheers, Aaron

poldham commented 1 year ago

Hi Aaron,

No worries and thanks so much for this. I now get where I was going wrong, I needed to specify the use of the wildcard rather than just providing it.

Thanks again, much appreciated!

Paul

On 26 Apr 2023, at 05:28, AaronBallagh @.***> wrote:

{ "query": { "bool": { "must": [ { "bool": { "should": [ { "match": { "title": "plant" } }, { "match": { "abstract": "plant" } }, { "match": { "description": "plant" } }, { "match": { "claim": "plant" } } ] } }, { "bool": { "should": [ { "wildcard": { "class_ipcr.symbol": { "value": "A01H" } } }, { "wildcard": { "class_ipcr.symbol": { "value": "C12N" } } } ] } } ] } }, "size": 100 }