bibanon / BASC-Archiver

Python-based Imageboard (4chan) complete thread archiver.
https://pypi.python.org/pypi/BASC-Archiver/
135 stars 18 forks source link

It doesn't work anymore, just idles #42

Closed nick-s-b closed 7 years ago

nick-s-b commented 7 years ago

I'm not sure what happened but BASC-archiver doesn't work anymore. I'm guessing some Python lib broke it? When I try to download a thread, it just idles and it does report the file count but doesn't download anything. There are no errors.

$ thread-archiver --path=/mnt/archives/ --thread-check-delay=60 --ssl --nothumbs https://boards.4chan.org/wg/thread/<threadidhere>
Starting download
Thread 4chan / wg / id  -  85 new replies

and then nothing...

Version: BASC-Archiver v0.9.3

Updating...

$ pip3 install basc-archiver
Requirement already satisfied: basc-archiver in /usr/lib/python3.6/site-packages
Requirement already satisfied: requests in /usr/lib/python3.6/site-packages (from basc-archiver)
Requirement already satisfied: docopt>=0.5.0 in /usr/lib/python3.6/site-packages (from basc-archiver)
Requirement already satisfied: BASC-py4chan>=0.5.5 in /usr/lib/python3.6/site-packages (from basc-archiver)

Unfortunately, verbose option does not work at all:

$ thread-archiver https://boards.4chan.org/wg/thread/<id here> --verbose
Usage:
  thread-archiver <url>... [options]
  thread-archiver -h | --help
  thread-archiver -v | --version

OS is ArchLinux. Everything is updated. Python version: Python 3.6.1

Any ideas or suggestions how to fix this?

antonizoon commented 7 years ago

@DanielOaks : I didn't really like how non verbose was turned on since the beginning. The script should at least notify the user when the thread is finished.

antonizoon commented 7 years ago

As for the issue, try again on Python 3.4 and 3.5, there maybe something new added to 3.6.

vxbinaca commented 7 years ago

@antonizoon how would I go about doing that?

nick-s-b commented 7 years ago

@vxbinaca About the only safe way (on Linux) is to build a clean chroot for older Python. If you attempt to downgrade system-wide one, you're entering the world of hurt because you'll break almost everything. I've been using a different script to grab pics... hopefully BASC will be updated soon.

Alabard commented 7 years ago

The same happens with android in QPython 1.3.1, tried to downgrade to QPython 1.2.5, still the same.

Tried to install it in windows in Python 3.5.3, it still would not work.

That said, an old version I have in Python 2.7.12 still works, its aparently Basc-Archiver 0.9.3, installing that one fixes the problem in both QPython and Windows Python 3.5.3, but there appear to be a few fixes between 0.9.3 and 0.9.4, so there is that.

Going through the commits it seems like commit 3a32573 "changed i.4cdn.org links to is.4chan.org" its the one where it stops working, maybe some change in 4chan side of things?

I'm a bit incompetent when it comes to all this python stuff, but I hope this helps in finding what when wrong with the newer versions.

antonizoon commented 7 years ago

Oh you gotta be kidding me. Did they stop offering i.4cdn.org? What are they doing? I hope I am wrong.

The reason we changed back to i.4cdn.org is because something makes is.4chan.org significantly slower.

https://github.com/bibanon/BASC-Archiver/issues/41

antonizoon commented 7 years ago

Wait i.4cdn.org works for me, false alarm. But now we know even less.

http://i.4cdn.org/a/1494367383125.png

antonizoon commented 7 years ago

Ha, I understand now. is.4chan.org itself was discontinued.

http://is.4chan.org/a/1494367383125.png

Update your BASC-Archivers to the latest version, where we have been using i.4cdn.org for months. Then tell me again if there are errors.

antonizoon commented 7 years ago

Use pip3 install --upgrade basc-archiver to upgrade, I think.

nick-s-b commented 7 years ago

I just did a reinstall and nothing has changed. Version is 0.9.3 and it just doesn't download anything. Maybe it has something do with 4chan's change of image CDN. Hmmm

antonizoon commented 7 years ago

@nick-s-b You know the latest version is 0.9.6 right?

https://pypi.python.org/pypi/BASC-Archiver

nick-s-b commented 7 years ago

@antonizoon interestingly, when you download the 0.9.6 package and do the thread-archiver --version it reports it as: BASC-Archiver v0.9.3

Go figure :) But yeah, I have the latest version running and there's no change... with Python 3.6, it just idles.

antonizoon commented 7 years ago

I guess we left the version string in init.py unchanged. How inconvenient.

I'm not easily able to help much with debugging for a week because I will lack internet. @vxbinaca could you check it out?

nick-s-b commented 7 years ago

I left it running for 5 min or so. Here's the output with all the timeouts:

 ./thread-archiver https://boards.4chan.org/w/thread/2000004
Starting download
Thread 4chan / w / 2000004  -  62 new replies
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f3f0c1375c0>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 438, in send
    timeout=timeout
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='is.4chan.org', port=80): Max retries exceeded with url: /w/1494375500677.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f3f0c1375c0>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/nsb/Downloads/BASC-Archiver-0.9.6/basc_archiver/sites/base.py", line 65, in run
    self.site.download_item(next_item)
  File "/home/nsb/Downloads/BASC-Archiver-0.9.6/basc_archiver/sites/fourchan.py", line 171, in download_item
    if utils.download_file(file_path, file_url):
  File "/home/nsb/Downloads/BASC-Archiver-0.9.6/basc_archiver/utils.py", line 30, in download_file
    i = requests.get(url)
  File "/usr/lib/python3.6/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 518, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 639, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 502, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='is.4chan.org', port=80): Max retries exceeded with url: /w/1494375500677.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f3f0c1375c0>: Failed to establish a new connection: [Errno 111] Connection refused',))

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f3f0c11bbe0>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 438, in send
    timeout=timeout
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='is.4chan.org', port=80): Max retries exceeded with url: /w/1494375407749.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f3f0c11bbe0>: Failed to establish a new connection: [Errno 111] Connection refused',))
...
antonizoon commented 7 years ago

As we see here at some point it still utilizes is.4chan.org. I think this is an easy fix so I will try to work on it when I get to port in 8 hours. If you can find it though it will help lots.

requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='is.4chan.org', port=80): Max retries exceeded with url: /b/1494375407749.jpg (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f3f0c11bbe0>: Failed to establish a new connection: [Errno 111] Connection refused',))
...
nick-s-b commented 7 years ago

Thank you @antonizoon !

Alabard commented 7 years ago

Ok, I think I found it, in fourchan.py starting at line 35, changing this:

# new urls
#FOURCHAN_API = 'api.' + FOURCHAN # api.4chan.org also works, but 4cdn still on
FOURCHAN_IMAGES = 'is.' + FOURCHAN
FOURCHAN_THUMBS = 'is.' + FOURCHAN
#FOURCHAN_STATIC = 's.' + FOURCHAN_CDN # static.4chan.org also works, but not yet

# cdn domains (no longer in use for images)
FOURCHAN_API = 'a.' + FOURCHAN_CDN
#FOURCHAN_IMAGES = 'i.' + FOURCHAN_CDN
#FOURCHAN_THUMBS = 'i.' + FOURCHAN_CDN
FOURCHAN_STATIC = 's.' + FOURCHAN_CDN

To this:

# new urls
#FOURCHAN_API = 'api.' + FOURCHAN # api.4chan.org also works, but 4cdn still on
#FOURCHAN_IMAGES = 'is.' + FOURCHAN
#FOURCHAN_THUMBS = 'is.' + FOURCHAN
#FOURCHAN_STATIC = 's.' + FOURCHAN_CDN # static.4chan.org also works, but not yet

# cdn domains (no longer in use for images)
FOURCHAN_API = 'a.' + FOURCHAN_CDN
FOURCHAN_IMAGES = 'i.' + FOURCHAN_CDN
FOURCHAN_THUMBS = 'i.' + FOURCHAN_CDN
FOURCHAN_STATIC = 's.' + FOURCHAN_CDN

Fixes it.

Not sure if there is something else to change but I couldn't find any other mention of 'is.' and so far its working without any problems, thanks for the help @antonizoon .

vxbinaca commented 7 years ago

@Alabard do a pull request please.

Alabard commented 7 years ago

@vxbinaca Ok, as I said I'm a bit incompetent when it comes to this stuff, so hopefully I did it right.

antonizoon commented 7 years ago

Ok i guess i had to do it. I've updated the pip version to 0.9.7 which uses this pull request, please update.

https://github.com/bibanon/BASC-Archiver/pull/43