Incomplete list of categories

spikex version: 0.5.2
Python version: 3.9.7
Operating System: Windows 10

Description

I want to get all categories of a page, but most categories are missing

What I Did

from spikex.wikigraph import load as wg_load
page = "Peking_2022"
categories = wg.get_categories(page, distance=1)

What I get: ['Category:Olympische_Winterspiele_2022'] The output I expect: ['Austragung der Olympischen Winterspiele', 'Olympische Winterspiele 2022', 'Sport (Hebei)', 'Sportveranstaltung 2022', 'Sportveranstaltung in Peking', 'Wikipedia:Veraltet nach Jahr 2022', 'Zukünftige Sportveranstaltung'] Prove: https://de.wikipedia.org/wiki/Olympische_Winterspiele_2022

I created a categorylinks dictionary from the categorylinks.sql.gz, so that the keys are the page_ids and under each key is the list of categories. I used your functions to get the page_id: page_id = self.get_pageid(self.redirect(page)) and my categorylinks dictionary . With this method I get the expected output. If this behaviour is not desired, I would like to think that there is a problem with the processing of categorylinks.sql.gz on your side.

erre-quadro / spikex

Incomplete list of categories #13

Description

What I Did