fake-name / xA-Scraper

69 stars 8 forks source link

Python error when trying to run run_web.sh #80

Closed Keboose closed 4 years ago

Keboose commented 4 years ago

I'm trying to use this to archive all the patreon creators I'm subscribed to. Running Ubuntu Server 18, I:

This is the output:

Checking database is up-to-date.
Sqlite database path: 'sqlite:////home/user/xA-Scraper/sqlite_db.db'
Running migrator!
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 916e4599f
92d -> b7e0935213d5, Fix stupid file naming issue.
Bind: <sqlalchemy.engine.base.Connection object at 0x7f759b8e6208>
Sess: <sqlalchemy.orm.session.Session object at 0x7f759b3356a0>
Traceback (most recent call last):
  File "db_migrate.py", line 24, in <module>
    manager.run()
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_script/__init__.py", line 417, in run
    result = self.handle(argv[0], argv[1:])
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_script/__init__.py", line 386, in handle
    res = handle(*args, **config)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_script/commands.py", line 216, in __call__
    return self.run(*args, **kwargs)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_migrate/__init__.py", line 95, in wrapped
    f(*args, **kwargs)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_migrate/__init__.py", line 280, in upgrade
    command.upgrade(config, revision, sql=sql, tag=tag)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/command.py", line 279, in upgrade
    script.run_env()
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/script/base.py", line 475, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file
    module = load_module_py(module_id, path)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/util/compat.py", line 174, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "migrations/env.py", line 73, in <module>
    run_migrations_online()
  File "migrations/env.py", line 66, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/runtime/environment.py", line 846, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/runtime/migration.py", line 365, in run_migrations
    step.migration_fn(**kw)
  File "/home/user/xA-Scraper/migrations/versions/b7e0935213d5_fix_stupid_file_naming_issue.py", line 176, in upgrade
    to_fix_names = sess.query(ScrapeTargets).filter(ScrapeTargets.site_name == settings.settings['da']["shortName"]).all()
KeyError: 'shortName'

If I'm reading that last line, it looks like it's having an issue with deviantart settings maybe? Was I supposed to remove any sections of settings.py I wasn't using?

fake-name commented 4 years ago

No, this is just a migration where I had some assumptions that fail for people not also scraping deviantart.

Keboose commented 4 years ago

Hey, thanks for being so on the ball to respond to issues! I don't run into that too often :smiley: Anyway, pulled the updates and tried again, new error, SQL-related looks like:

Checking database is up-to-date.
Sqlite database path: 'sqlite:////home/user/xA-Scraper/sqlite_db.db'
Running migrator!
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade b7e0935213d5 -> 46e1cef59e06, Fix Pixiv URLs for new format
Getting Pixiv DB Entries
Fixing database entries
0it [00:00, ?it/s]
Migrated!
Traceback (most recent call last):
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
    cursor, statement, parameters, context
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: cannot commit - no transaction is active

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "db_migrate.py", line 24, in <module>
    manager.run()
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_script/__init__.py", line 417, in run
    result = self.handle(argv[0], argv[1:])
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_script/__init__.py", line 386, in handle
    res = handle(*args, **config)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_script/commands.py", line 216, in __call__
    return self.run(*args, **kwargs)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_migrate/__init__.py", line 95, in wrapped
    f(*args, **kwargs)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/flask_migrate/__init__.py", line 280, in upgrade
    command.upgrade(config, revision, sql=sql, tag=tag)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/command.py", line 279, in upgrade
    script.run_env()
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/script/base.py", line 475, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file
    module = load_module_py(module_id, path)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/util/compat.py", line 174, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "migrations/env.py", line 73, in <module>
    run_migrations_online()
  File "migrations/env.py", line 66, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/runtime/environment.py", line 846, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/alembic/runtime/migration.py", line 365, in run_migrations
    step.migration_fn(**kw)
  File "/home/user/xA-Scraper/migrations/versions/46e1cef59e06_fix_pixiv_urls_for_new_format.py", line 78, in upgrade
    conn.execute("COMMIT")
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 982, in execute
    return self._execute_text(object_, multiparams, params)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1155, in _execute_text
    parameters,
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1253, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1473, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
    cursor, statement, parameters, context
  File "/home/user/xA-Scraper/venv/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) cannot commit - no transaction is active
[SQL: COMMIT]
(Background on this error at: http://sqlalche.me/e/e3q8)

Again, this could potentially just be me not setting my system up right... What say you?

fake-name commented 4 years ago

Nope, just more derp on my part.

Can you try again after 6cfb580? Also, if it still doesn't work, try deleting the sqlite database (/home/user/xA-Scraper/sqlite_db.db) and re-run. It's possible something's a bit stuck somewhere.

Keboose commented 4 years ago

Sorry, still not working! I pulled update 203e8a4dbf1b2de636c3fce25aa0c126071f989b and deleted the database, still the same error after running run_web.sh

fake-name commented 4 years ago

Gah, Ok, 81441a8 should actually (as in, I tested it) fix the issue.

Sorry about all that!

Keboose commented 4 years ago

Great! thanks, I'm able to get both run_ scripts up and running!

The "PAT ARTISTS" Page doesn't seem to be displaying the contents correctly, though. The "Image Source Page URL" column on the page has links, but they lead to links like http://localhost:6543/source/by-site/[%22post%22,%20%2217678087%22]. If I click on that link, I get a page

Invalid site-name!

The site-name '["post", "17678087"]' is not in the valid site-name list ['fa', 'hf', 'wy', 'ib', 'px', 'sf', 'pat', 'da', 'ng', 'ay', 'as', 'yp', 'tum']

I'm seeing all the content downloaded under ./absolute_path_downloads_will_go_here/Patreon/[artist], so we're pretty much there. It also looks like all the images are downloaded twice, once as "[post_ID]-[Image_name].png", and again as "[post_ID]-[another 8-digit number?]-[Image_name].png"

fake-name commented 4 years ago

Great!

The "PAT ARTISTS" Page doesn't seem to be displaying the contents correctly, though. The "Image Source Page URL" column on the page has links, but they lead to links like http://localhost:6543/source/by-site/[%22post%22,%20%2217678087%22]. If I click on that link, I get a page

I think that's because I never bothered properly implementing content direct-links for patreon, mostly because I'm lazy.

Ideally, these should link to the actual content hosted on patreon.

It also looks like all the images are downloaded twice, once as "[post_ID]-[Image_name].png", and again as "[post_ID]-[another 8-digit number?]-[Image_name].png"

This is because the main image on a post gets included twice. It's included once as the main post image, and once as an attachment to the post. I think there's some minor resizing patreon does for this, and I'm not totally confident they'll always be duplicates, so I just save everything.