disinfoRG / ZeroScraper

Web scraper made by 0archive.
https://0archive.tw
MIT License
10 stars 2 forks source link

Error when creating db using MariaDB #124

Open bwlynch opened 4 years ago

bwlynch commented 4 years ago

I was able to create the database without a problem using MySQL 5.7, but when using MariaDB 10, I keep getting the same error below. Not sure if it's a timeout issue or an actual bug causing the disconnection.

File "/root/.local/share/virtualenvs/ZeroScraper-NYgLjCjb/lib/python3.7/site-packages/pymysql/connections.py", line 707, in _read_bytes CR.CR_SERVER_LOST, "Lost connection to MySQL server during query") sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') [SQL: ALTER TABLE Site ADD COLUMN airtable_id VARCHAR(256)] (Background on this error at: http://sqlalche.me/e/e3q8)

bwlynch commented 4 years ago

To followup, I've found the issue consistently occurs in the same 3 files as documented below. (I've included only parts of the error read-out since it's quite lengthy, but can include the entirety if needed).

Error 1:

  File "/home/ubuntu/ZeroScraper2/migrations/versions/da6f10c8ebf4_add_site_airtable.py", line 21, in upgrade
    "Site", sa.Column("airtable_id", sa.String(256), nullable=True, unique=True)

Error 2:

  File "/home/ubuntu/ZeroScraper2/migrations/versions/95e1de28f5ba_add_info_columns.py", line 22, in upgrade
    op.add_column("Site", sa.Column("site_info", sa.JSON, nullable=False))

Error 3:

  File "/home/ubuntu/ZeroScraper2/migrations/versions/2f07cdaab83d_add_site_crawl_time.py", line 20, in upgrade
    op.add_column("Site", sa.Column("last_crawl_at", sa.Integer))

The lines in the error messages seem to finish before the process fails, so one workaround I've found is running pipenv run alembic upgrade head, commenting out the line in the error message and the preceding ones in the file that already ran, running it again, and so on for each error. (though for Error 1 almost all of the upgrade function seems to need to be commented out)

Based on searching online, I tried doing some basic attempts at troubleshooting what might be causing connection timeouts, though I wasn't able to successfully get any of them to work:

  1. Increasing the max_allowed_packet size in /etc/mysql/my.cnf
  2. Using set global thread_pool_idle_timeout to adjust the value
  3. Adjusting the pools/pool recycling for SQLAlchemy

Again the issue doesn't occur when using MySQL, but it seems like using MariaDB may be necessary for this project due to issues using MySQL with the ArticleParser: https://github.com/disinfoRG/ArticleParser/issues/43