akshayravikumar / TeXnique

A LaTeX Typesetting Game
https://texnique.xyz
MIT License
243 stars 34 forks source link

(Suggestion) Get more problems by scraping Wikipedia #14

Open acganesh opened 5 years ago

acganesh commented 5 years ago

As in https://stackoverflow.com/a/37639225, you might be able to get more problems if you scrape equations from curated pages on Wikipedia. For instance:

import wikipedia
from bs4 import BeautifulSoup

topic = wikipedia.page('Riemann zeta function')
equations = BeautifulSoup(topic.html()).find_all('annotation')

equations[11].text           
> '{\\displaystyle \\zeta (s)=\\sum _{n=1}^{\\infty }{\\frac {1}{n^{s}}}}'

The formatting is a bit wonky (and may not play well with the strict visual matching scheme). At the same time, this might be a low-effort way to bootstrap your problem database.