Integrate a CSS scraper into the general scraping procedure. As CSS data may be different for each language, scraped content must be saved into each language's dump.
Mechanism:
While scraping articles, extract raw CSS links from HTML head and save them to file (as soon as found to avoid data loss).
After article scraping is done, parse all collected raw links and extract unique CSS module names (wikipedia uses a modular system for requesting stylesheets)
Download the CSS of each module into its own file.
Parse all CSS modules for finding links to external resources (icons, backgrounds, etc.) and download them.
Combine all available CSS modules into a single stylesheet, retargeting all external URLs to existing local files.
On generation step, copy single stylesheet and its resources to the assets/static directory of cdpedia image.
Visually the results are OK, some minor styling issues were fixed manually. These are some screenshots of the main page in all available languages: esfrptay
Integrate a CSS scraper into the general scraping procedure. As CSS data may be different for each language, scraped content must be saved into each language's dump.
Mechanism:
assets/static
directory of cdpedia image.Lot of ideas taken from @spiccinini's preprocess_stylesheets proof of concept.
Visually the results are OK, some minor styling issues were fixed manually. These are some screenshots of the main page in all available languages: es fr pt ay