alephdata / aleph

Search and browse documents and data; find the people and companies you look for.
http://docs.aleph.occrp.org
MIT License
2.03k stars 272 forks source link

Document upload freezing #734

Closed baughmann closed 5 years ago

baughmann commented 5 years ago

I cannot seem to complete uploading a document (via the UI) or crawling a directory (via the CLI). It reaches a point always less than "30%" complete and completely freezes. I've been truing to trace the issue to no avail.

I'm running Docker 2.1.0.1 on Mac OS 10.14.6 (but I've also tried on an Ubuntu 18.04 VM). I'm following the "Developer Setup" guide (https://github.com/alephdata/aleph/wiki/Developer-setup). I attempted to follow the macOS instructions (here: https://github.com/alephdata/aleph/wiki/Running-on-macOS) but really cannot follow them because Homebrew complains that it's not a valid command. I've also increased the max_map_count on the host using Docker's screen feature.

For what it's worth, in order to avoid hitting the catch on line 79 of DocumentUploadDialog.jsx in the UI, I had to actually return something from the ingestDocument.COMPLETE reducer at line 21 of collectionStatus in the UI (I changed () => {} to state => state). This allowed me to use the UI without error.

Below are the log outputs from the container aleph_ingest-file_1. There did not seem to be any interesting logs from the API, ElasticSearch, Redis, or ConvertDocument containers.

aleph_ingest-file_1:

INFO:servicelayer.worker:Worker has 6 threads.

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 185, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/servicelayer/worker.py", line 55, in process
    task = Stage.get_task(self.conn, stages, timeout=5)
  File "/usr/local/lib/python3.7/dist-packages/servicelayer/jobs.py", line 258, in get_task
    task_data = conn.blpop(queues, timeout=timeout)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 1550, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 775, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 789, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 637, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 290, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 224, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 199, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 185, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/servicelayer/worker.py", line 55, in process
    task = Stage.get_task(self.conn, stages, timeout=5)
  File "/usr/local/lib/python3.7/dist-packages/servicelayer/jobs.py", line 258, in get_task
    task_data = conn.blpop(queues, timeout=timeout)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 1550, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 775, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 789, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 637, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 290, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 224, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 199, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

INFO:servicelayer.worker:Worker has 6 threads.
DEBUG:ingestors.worker:Ingest: <E('1','RFI-2018-100.docx')>
INFO:servicelayer.archive.file:Archive: /data
INFO:ingestors.manager:Ingestor [<E('1','RFI-2018-100.docx')>]: OfficeOpenXMLIngestor
INFO:ingestors.support.convert:Converting [RFI-2018-100.docx] to PDF...
INFO:ingestors.support.ocr:Configuring OCR engine (eng)
INFO:ingestors.support.ocr:w: 999, h: 487, l: eng, c: 95, took: 0.19106
INFO:ingestors.support.ocr:OCR: 48 chars (from 26926 bytes)
INFO:ingestors.support.ocr:w: 999, h: 487, l: eng, c: 95, took: 0.04691
INFO:ingestors.support.ocr:OCR: 2 chars (from 110651 bytes)
INFO:ingestors.support.ocr:w: 996, h: 995, l: eng, c: 87, took: 0.19811
INFO:ingestors.support.ocr:OCR: 93 chars (from 20494 bytes)
INFO:ingestors.support.ocr:w: 996, h: 995, l: eng, c: 71, took: 0.26873
INFO:ingestors.support.ocr:OCR: 87 chars (from 31011 bytes)
INFO:ingestors.worker:Sending 12 entities to: index

My aleph.env file:

# Aleph environment configuration
#
# This file is loaded by docker-compose and transformed into a set of
# environment variables inside the containers. These are, in turn, parsed
# by aleph and used to configure the system.

# Random string:
ALEPH_SECRET_KEY=

# Visible instance name in the UI
ALEPH_APP_TITLE=Aleph
# Name needs to be a slug, as it is used e.g. for the ES index, SQS queue name:
ALEPH_APP_NAME=aleph
ALEPH_UI_URL=http://localhost:8080/

# ALEPH_URL_SCHEME=https
# ALEPH_FAVICON=https://investigativedashboard.org/static/favicon.ico
# ALEPH_LOGO=http://assets.pudo.org/img/logo_bigger.png

# Other customisations
ALEPH_SAMPLE_SEARCHES=Vladimir Putin:TeliaSonera

# Set email addresses, separated by colons, that will be made admin.
# ALEPH_ADMINS=friedrich@pudo.org:demo@pudo.org

# Login modalities
ALEPH_PASSWORD_LOGIN=true

# OAuth configuration
# Currently supported providers are Google, Facebook and Azure AD OAuth
# Note that you do not need to fill out all fields in order to use it
ALEPH_OAUTH=false
ALEPH_OAUTH_KEY=
ALEPH_OAUTH_SECRET=

# Where and how to store the underlying files:
# ARCHIVE_TYPE=file
# ARCHIVE_PATH=/data

# Or, if 'ALEPH_ARCHIVE_TYPE' configuration is 's3':
# ARCHIVE_BUCKET=
# AWS_ACCESS_KEY_ID=
# AWS_SECRET_ACCESS_KEY=

# Queue mechanism
# REDIS_URL=redis://redis:6379/0

# Content options
ALEPH_OCR_DEFAULTS=eng
# ALEPH_LANGUAGES=en:de:fr:es:tr:ar ...

# Provide a valid email to send alerts from:
ALEPH_MAIL_FROM=
ALEPH_MAIL_HOST=
ALEPH_MAIL_ADMIN=
ALEPH_MAIL_USERNAME=
ALEPH_MAIL_PASSWORD=
ALEPH_MAIL_PORT=25
ALEPH_MAIL_USE_TLS=false

# Debug mode (insecure)
ALEPH_DEBUG=true

# Read-only mode:
# ALEPH_MAINTENANCE=true

# Enable HTTP caching
# ALEPH_CACHE=true

I'm still trying to trace this myself, but I'm just sort-of taking shots in the dark here. Could this be an error with my Docker version or something simple like that that I may have overlooked in the documentation?

pudo commented 5 years ago

Very odd. It's like redis keeps restarting or something like that. Does your system have memory upwards of 4GiB?

The react bug should be fixed in develop, I hope to make a new release today.

baughmann commented 5 years ago

Very odd. It's like redis keeps restarting or something like that. Does your system have memory upwards of 4GiB?

The react bug should be fixed in develop, I hope to make a new release today.

Oh yes, I'm running on 32gb.

The log is actually misleading. I will trim it down. I restarted the container a few times after making some changes in my attempt to track the problem. The restart was from me. The OSError: Connection closed by server. occurred every time the container was restarted.

Just bizarre. I'm trying to dig around to figure out where it might be getting held up, but I've had no luck so far (likely since I didn't write it lol).

pudo commented 5 years ago

It might be worth putting a watch on docker-compose -f docker-compose.dev.yml ps to see if anything is happening over there. Otherwise I'd also recommend doing a make stop to shut down all services and then starting again.

baughmann commented 5 years ago

@pudo I've actually done a docker system prune and docker system prune --volumes several times, even going so far as to re-pull the images after ensuring that docker is wiped clean. I've also tried on other OSes.

What version of Docker are you using? Maybe I can rollback to yours just to see.

pudo commented 5 years ago

What's in your aleph.env, if I may ask?

pudo commented 5 years ago

I'm on 2.1.0.1 on OS X

baughmann commented 5 years ago

What's in your aleph.env, if I may ask?

@pudo Just the default (minus adding my email to the admin list). I added the default aleph.env that I'm using to my original post just for completeness.

I'm on 2.1.0.1 on OS X

I guess that settles the Docker version debate then haha.

pudo commented 5 years ago

The other possibility is that the docker networking isn't connecting things as it should. It may be worth entering the ingest-file container and trying to ping redis, see if that works. Also what the actual env var for REDIS_URL in that container is.

baughmann commented 5 years ago

@pudo The REDIS_URL is redis://redis:6379/0.

I was also able to successfully ping redis:

# ping redis
PING redis (172.20.0.3) 56(84) bytes of data.
64 bytes from aleph_redis_1.aleph_default (172.20.0.3): icmp_seq=1 ttl=64 time=0.053 ms
64 bytes from aleph_redis_1.aleph_default (172.20.0.3): icmp_seq=2 ttl=64 time=0.081 ms
64 bytes from aleph_redis_1.aleph_default (172.20.0.3): icmp_seq=3 ttl=64 time=0.040 ms
64 bytes from aleph_redis_1.aleph_default (172.20.0.3): icmp_seq=4 ttl=64 time=0.055 ms
^C
--- redis ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 134ms
rtt min/avg/max/mdev = 0.040/0.057/0.081/0.015 ms
pudo commented 5 years ago

Technology makes no sense, I'm going into gardening.

baughmann commented 5 years ago

@pudo Were there any additional steps I needed to take when getting setting up the developer environment on OS X, besides setting up the max_map_count? Docker is usually just plug 'n play.

Technology makes no sense, I'm going into gardening.

That's a good move. I hear landscaping is where the big money is these days.

baughmann commented 5 years ago

Well, the good news is I figured out how to correctly install the Mac OS dependencies. The page with the instructions on the wiki is poorly formatted.

Once homebrew is installed, you simply perform the following commands:

brew install leveldb brew install icu4c env CFLAGS=-I/usr/local/opt/icu4c/include (or set it permanently in your .bash_profile) env LDFLAGS=-L/usr/local/opt/icu4c/lib (ditto above) PATH=$PATH:/usr/local/opt/icu4c/bin (or set it permanently in usr) pip install pyicu (or pip3 install pyicu)

This, of course, does not resolve the upload freezing issue, but it does make me feel better.

pudo commented 5 years ago

Well, the good news is I figured out how to correctly install the Mac OS dependencies. The page with the instructions on the wiki is poorly formatted.

Once homebrew is installed, you simply perform the following commands:

brew install leveldb brew install icu4c env CFLAGS=-I/usr/local/opt/icu4c/include (or set it permanently in your .bash_profile) env LDFLAGS=-L/usr/local/opt/icu4c/lib (ditto above) PATH=$PATH:/usr/local/opt/icu4c/bin (or set it permanently in usr) pip install pyicu (or pip3 install pyicu)

Thanks! I've added this to our newly growing documentation!

https://docs.alephdata.org/developers/ftm#optional-enhanced-transliteration-support

pudo commented 5 years ago

I'm gonna close this as "can't reproduce". Please let us know if you have further information, or if upgrading to a newer version removes the issue. Sorry not to be able to solve.

anderser commented 5 years ago

I am having the same problem. Docker 2.1.0.3 on Mac OSX 10.14.6. Cloned today, running make all, the containers seems to spin up fine, uploading a document, and I can see log entries of the document being added in aleph_ingest-file_1 container. But in the UI it seems to be stuck on 25% update progress. Docker has 7GB of memory available.

pudo commented 5 years ago

@anderser Are you running this in production or development mode? In development mode you need to manually run the aleph worker, i.e. make shell, them aleph worker to commence indexing.

anderser commented 5 years ago

@pudo Well, that solved it. Was running dev mode yes. Thanks!

aaronsdevera commented 4 years ago

Same problem.

Screen Shot 2020-07-04 at 12 59 04 PM

freezes at a certain percentage point. All the files have completed upload in-GUI and with the client, its just this processing part where things stop up.

Going to see how redis might be poked to improve this...

pudo commented 4 years ago

Can you try this again with the latest release, 3.8.5? There was a bug in the backend in 3.8.4.

Also check out: https://app.gitbook.com/@aleph/s/docs/developers/technical-faq#my-import-is-stuck-at-67-whats-wrong