EventRegistry / event-registry-python

Python package for API access to news articles and events in the Event Registry
http://eventregistry.org/
MIT License
232 stars 54 forks source link

Complex query returning results outside of specified date range #58

Closed specialprocedures closed 2 years ago

specialprocedures commented 2 years ago

Hi,

I'm having difficulty putting together a complex query using the JSON structure. I've followed the instructions here, but I'm still getting odd results. Specifically, I'm getting results outside of the date range I've specified, I'm also seeing some very low-ranked sources in my results, suggesting the startSourceRankPercentile and endSourceRankPercentile params aren't working.

What I'm trying to do is get results between the given date range (required) and source rank range (required), where at least one of the listed concepts or keywords occur.

See below for a rough reproduction of my query.

{
    "query": {
        "$query": {
            "$and": [
                {
                    "dateStart": "2022-02-01",
                    "dateEnd": "2022-03-01",
                    "startSourceRankPercentile": 0,
                    "endSourceRankPercentile": 20,
                    "$or": [
                        {
                            "conceptUri": {
                                "$or": [
                                    "http://en.wikipedia.org/wiki/British_Columbia",
                                    "http://en.wikipedia.org/wiki/Flood",
                                    "http://en.wikipedia.org/wiki/Natural_disaster",
                                    "http://en.wikipedia.org/wiki/Environment_and_Climate_Change_Canada",
                                    "http://en.wikipedia.org/wiki/Global_warming",
                                    "http://en.wikipedia.org/wiki/Severe_weather",
                                    "http://en.wikipedia.org/wiki/K\\u00f6ppen_climate_classification",
                                    "http://en.wikipedia.org/wiki/University_of_Victoria",
                                    "http://en.wikipedia.org/wiki/Preprint",
                                    "http://en.wikipedia.org/wiki/Peer_review",
                                    "http://en.wikipedia.org/wiki/Probability",
                                    "http://en.wikipedia.org/wiki/Atmospheric_river",
                                    "http://en.wikipedia.org/wiki/Precipitation",
                                    "http://en.wikipedia.org/wiki/Email"
                                ]
                            }
                        },
                        {
                            "keyword": {
                                "$or": [
                                    "A list",
                                    "of a bunch of",
                                    "people",
                                    "that aren't on wikipedia"
                                ]
                            }
                        }
                    ]
                }
            ]
        }
    },
    "resultType": "articles",
    "articlesPage": 1,
    "articlesSortBy": "rel",
    "articlesArticleBodyLen": -1,
    "includeArticleConcepts": true,
    "includeArticleSocialScore": true,
    "includeArticleLocation": true,
    "includeSourceLocation": true,
    "includeSourceRanking": true,
    "forceMaxDataTimeWindow": -1,
    "apiKey": "MY_API_KEY"
}
gregorleban commented 2 years ago

Hi,

I think you were close, but need some small changes. Try with this one:

{
    "query": {
        "$query": {
            "$and": [
                {
                    "dateStart": "2022-02-01",
                    "dateEnd": "2022-03-01"
                },
                {
                    "$or": [
                        {
                            "conceptUri": {
                                "$or": [
                                    "http://en.wikipedia.org/wiki/British_Columbia",
                                    "http://en.wikipedia.org/wiki/Flood",
                                    "http://en.wikipedia.org/wiki/Natural_disaster",
                                    "http://en.wikipedia.org/wiki/Environment_and_Climate_Change_Canada",
                                    "http://en.wikipedia.org/wiki/Global_warming",
                                    "http://en.wikipedia.org/wiki/Severe_weather",
                                    "http://en.wikipedia.org/wiki/K\\u00f6ppen_climate_classification",
                                    "http://en.wikipedia.org/wiki/University_of_Victoria",
                                    "http://en.wikipedia.org/wiki/Preprint",
                                    "http://en.wikipedia.org/wiki/Peer_review",
                                    "http://en.wikipedia.org/wiki/Probability",
                                    "http://en.wikipedia.org/wiki/Atmospheric_river",
                                    "http://en.wikipedia.org/wiki/Precipitation",
                                    "http://en.wikipedia.org/wiki/Email"
                                ]
                            }
                        },
                        {
                            "keyword": {
                                "$or": [
                                    "A list",
                                    "of a bunch of",
                                    "people",
                                    "that aren't on wikipedia"
                                ]
                            }
                        }
                    ]
                }
            ]
        },
        "$filter": {
            "startSourceRankPercentile": 0,
            "endSourceRankPercentile": 20,
        }
    },
    "resultType": "articles",
    "articlesPage": 1,
    "articlesSortBy": "rel",
    "articlesArticleBodyLen": -1,
    "includeArticleConcepts": true,
    "includeArticleSocialScore": true,
    "includeArticleLocation": true,
    "includeSourceLocation": true,
    "includeSourceRanking": true,
    "forceMaxDataTimeWindow": -1,
    "apiKey": "MY_API_KEY"
}
specialprocedures commented 2 years ago

Thanks Gregor, I'll take a look at this tomorrow. Really appreciate the quick response :-)

specialprocedures commented 2 years ago

Hi there, I've tried a number of variations of the approach above and unfortunately I'm still having problems. The recurring issue being I'm not getting any hits from the keywords.

I've tried speccing them as above, but also explicity passing body location as below:

{
    "query": {
            "$query": {
                "$and": [
                    {
                        "dateStart": "2022-02-01",
                        "dateEnd": "2022-03-01"
                    },
                    {
                        "$or": [
                            {
                                "keyword": "Pacific Climate Impacts Consortium",
                                "keywordLoc": "body"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Flood"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Natural_disaster"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Severe_weather"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Global_warming"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Email"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/British_Columbia"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Preprint"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Atmospheric_river"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Probability"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Precipitation"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Peer_review"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Ku00f6ppen_climate_classification"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/Environment_and_Climate_Change_Canada"
                            },
                            {
                                "conceptUri": "http://en.wikipedia.org/wiki/University_of_Victoria"
                            }
                        ]
                    }
                ]
            },
            "$filter": {
                "dataType": [
                    "news"
                ]
            }
        },
        "resultType": "articles",
        "articlesPage": 1,
        "articlesSortBy": "rel",
        "articlesArticleBodyLen": -1,
        "includeArticleConcepts": true,
        "includeArticleSocialScore": true,
        "includeArticleLocation": true,
        "includeSourceLocation": true,
        "includeSourceRanking": true,
        "forceMaxDataTimeWindow": -1,
        "apiKey": "MY_API_KEY"
    }
}
gregorleban commented 2 years ago

Did you try to download all results matching the query that I've sent?

Did you try removing the conceptUri part of the query and just keeping the keywords? If you're getting results for that query, then it's very likely that it's just that the conceptUri part of the query is bringing a lot of results (e.g. email, flood, natural disaster are likely matching quite some results) and the results mentioning some rare keywords are more rare.

gregorleban commented 2 years ago

btw, the query that you wrote the last is also perfectly legit. You can specify all the keywords in this way, each keyword in it's own { ... } block.

specialprocedures commented 2 years ago

I think you're on to something there and part of the problem may lie in my post-processing. I think what I'll do is split the query into the keyword and concept parts, then merge. Thanks again for your help.