LexiCrush / database-preprocessing

0 stars 0 forks source link

Generate-Noun-Banks

Setup Env & Run

  1. Install Python >= 3.7.
  2. Install Pandas with pip: pip install pandas.
  3. Import 'csv' Module: (Built-In)

Pre-Processing

  1. Load harvested files into ./nounbanks/.
  2. Clean each nounBank (unnecessary characters, duplicate entries, or violations of LexiCrush game logic).
  3. Convert each noun bank into a Pandas dataframe.
  4. Setup up a connection using MySQLdb.
    from pandas.io import sql 
    import MySQLdb
    con = MySQLdb.connect()
  5. Use df.to_sql() method to populate a mySQL (configurable) style DB with contents from df.

Data Sources

USA States Surface Area
World Countries - Population
Word Countries - Surface Area
Big List of Animals

Link

SQLalchemy, MySQLdb.connect(), .to_sql()