Commonists / pageview-api

Wikimedia Pageview API client
MIT License
27 stars 7 forks source link

Missing Entries #5

Closed justin-meisner closed 4 years ago

justin-meisner commented 4 years ago

Can you explain what happens if there are 0 views on a page or when collecting a list of items for a span of a month if there are no entries for a certain day?

PierreSelim commented 4 years ago

I'd say it's not returning data for 0 views on a given day as for:

curl -X GET "https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/fr.wikipedia.org/all-access/all-agents/Isabelle%20Rico-Lattes/daily/20200301/20200401" -H "accept: application/json"

So you will just have a list without the 0 view days.

If you select a period on which there is never any data, you'll get a 404 error, the api client will raise a ZeroOrDataNotLoadedException (per https://github.com/Commonists/pageview-api/blob/master/pageviewapi/client.py#L150)

justin-meisner commented 4 years ago

Ok, so the project I am working on, I am collecting data from the Citi Bank page for the month of January but nothing is appended to my list for January 3rd and January 10th only.

page_list = []
for name in names:
    page_views = pageviewapi.per_article(lang, name, start, end, access = access, agent = agent, granularity = granularity)   
    page_list.append(page_views)

print(page_list[8]['items'][0:12)

[{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020010100', 'access': 'all-access', 'agent': 'all-agents', 'views': 5}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020010200', 'access': 'all-access', 'agent': 'all-agents', 'views': 4}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020010400', 'access': 'all-access', 'agent': 'all-agents', 'views': 5}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020010500', 'access': 'all-access', 'agent': 'all-agents', 'views': 8}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020010600', 'access': 'all-access', 'agent': 'all-agents', 'views': 4}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020010700', 'access': 'all-access', 'agent': 'all-agents', 'views': 5}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020010800', 'access': 'all-access', 'agent': 'all-agents', 'views': 15}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020010900', 'access': 'all-access', 'agent': 'all-agents', 'views': 4}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020011100', 'access': 'all-access', 'agent': 'all-agents', 'views': 7}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020011200', 'access': 'all-access', 'agent': 'all-agents', 'views': 5}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020011300', 'access': 'all-access', 'agent': 'all-agents', 'views': 3}, 
{'project': 'en.wikipedia', 'article': 'Citi_Bank', 'granularity': 'daily', 'timestamp': '2020011400', 'access': 'all-access', 'agent': 'all-agents', 'views': 21}]

Why would this be happening and how can I check for it?

justin-meisner commented 4 years ago

Ive tried

try:
        page_views = pageviewapi.per_article(lang, name, start, end, access = access, agent = agent, granularity = granularity)   
        page_list.append(page_views)
    except (ZeroOrDataNotLoadedException):
        print("ZeroOrDataNotLoadedException")

but can't catch the exception for missing days

PierreSelim commented 4 years ago

You'll get the exception only if there is no days with data in your query (from my understanding of the API). Otherwise you'll have list with only the days with data.