FDA / openfda

openFDA is an FDA project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.
https://open.fda.gov
Creative Commons Zero v1.0 Universal
569 stars 131 forks source link

deviceevent index empty #150

Closed Mariano215 closed 3 years ago

Mariano215 commented 3 years ago

Attaching docker compose logs. docker_compose.zip

Screenshot from 2020-12-21 13-40-16

dkrylovsb commented 3 years ago

Device event pipeline takes awhile to run and populate the index.

Mariano215 commented 3 years ago

It's been up for about 6 hours and 30 minutes and no data in device events so far. I'm running 8 processors with 54 G Ram and 2 T drive dedicated to the virtual machine.

Is there a way to check what's happening?

Latest status: pipeline_1 | 2020-12-22 17:27:42,345 mapreduce.py:176 Starting MapReduce: [Collection(54 items)] -> ./data/maude/init-2020-12-20-json.db@2, M: CSV2JSONMapper, R: CSV2JSONJoinReducer pipeline_1 | 2020-12-22 21:27:42,574 pipeline.py:468 Does not conform to device structure. Skipping: 7763969, ##### pipeline_1 | 2020-12-22 21:27:42,575 pipeline.py:468 Does not conform to device structure. Skipping: 4201358, ##### pipeline_1 | 2020-12-22 21:27:48,052 pipeline.py:468 Does not conform to device structure. Skipping: 7773558, ##### pipeline_1 | 2020-12-22 21:27:48,052 pipeline.py:468 Does not conform to device structure. Skipping: 1902177, ##### pipeline_1 | 2020-12-22 21:27:55,544 pipeline.py:468 Does not conform to device structure. Skipping: 7780933, ##### pipeline_1 | 2020-12-22 21:27:55,545 pipeline.py:468 Does not conform to device structure. Skipping: 3501120, #####

curl http://localhost:8000/status [ { "endpoint": "deviceclearance", "status": "GREEN", "last_updated": "2020-12-14", "documents": 157716, "requests": 20, "latency": 6.619047619047619 }, { "endpoint": "foodevent", "status": "GREEN", "last_updated": "2020-07-28", "documents": 91776, "requests": 20, "latency": 5.761904761904762 }, { "endpoint": "othernsde", "status": "GREEN", "last_updated": "2020-12-19", "documents": 469657, "requests": 20, "latency": 28.238095238095237 }, { "endpoint": "othersubstance", "status": "GREEN", "last_updated": "2020-12-11", "documents": 119479, "requests": 20, "latency": 27.476190476190474 } ]

Mariano215 commented 3 years ago

Now it errored out...

pipeline_1 | Traceback (most recent call last): pipeline_1 | File "openfda/maude/pipeline.py", line 774, in pipeline_1 | luigi.run() pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/interface.py", line 210, in run pipeline_1 | return _run(*args, kwargs)['success'] pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/interface.py", line 238, in _run pipeline_1 | return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory) pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/interface.py", line 197, in _schedule_and_run pipeline_1 | success &= worker.run() pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/worker.py", line 872, in run pipeline_1 | self._handle_next_task() pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/worker.py", line 783, in _handle_next_task pipeline_1 | self._email_task_failure(task, expl) pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/worker.py", line 509, in _email_task_failure pipeline_1 | self._email_error(task, formatted_traceback, pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/worker.py", line 517, in _email_error pipeline_1 | notifications.send_error_email(formatted_subject, message, task.owner_email) pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/notifications.py", line 294, in send_error_email pipeline_1 | send_email( pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/notifications.py", line 268, in send_email pipeline_1 | email_sender(config, sender, subject, message, recipients, image_png) pipeline_1 | File "/usr/src/openfda/_python-env/lib/python3.8/site-packages/luigi-2.1.1-py3.8.egg/luigi/notifications.py", line 139, in send_email_smtp pipeline_1 | smtp = smtplib.SMTP(kwargs) if not smtp_ssl else smtplib.SMTP_SSL(**kwargs) pipeline_1 | File "/usr/local/lib/python3.8/smtplib.py", line 253, in init pipeline_1 | (code, msg) = self.connect(host, port) pipeline_1 | File "/usr/local/lib/python3.8/smtplib.py", line 339, in connect pipeline_1 | self.sock = self._get_socket(host, port, self.timeout) pipeline_1 | File "/usr/local/lib/python3.8/smtplib.py", line 308, in _get_socket pipeline_1 | return socket.create_connection((host, port), timeout, pipeline_1 | File "/usr/local/lib/python3.8/socket.py", line 808, in create_connection pipeline_1 | raise err pipeline_1 | File "/usr/local/lib/python3.8/socket.py", line 796, in create_connection pipeline_1 | sock.connect(sa) pipeline_1 | OSError: [Errno 99] Cannot assign requested address

dkrylovsb commented 3 years ago

This particular pipeline does a lot of data crunching for quite a while (index remains empty) and then writes results to the Elasticsearch index as the very last step. If it has not yet finished on your end, please feel free to share the container log file.

dkrylovsb commented 3 years ago

OK, so it did finish but with an error. I'm happy to look at your log files:docker-compose logs --tail="all" pipeline

Mariano215 commented 3 years ago

Much appreciated: docker-logs.zip

Some other questions if you don't mind:

  1. Minimum system requirements?
  2. Most stable OS to run this one?

Thanks and I look forward to your analysis - happy to help in any way I can.

dkrylovsb commented 3 years ago

leveldb.LevelDBError: IO error: ./data/maude/init-2020-12-20-json.db-mapreduce-output-2020-12-22-05-27/shard-00001-of-00002.db: Too many open files

Ah. Too many open files. Let us adjust the docker setup and do a PR with the fix. Please stay tuned.

dkrylovsb commented 3 years ago

PR: https://github.com/FDA/openfda/pull/151