FDA / openfda

openFDA is an FDA project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.
https://open.fda.gov
Creative Commons Zero v1.0 Universal
569 stars 131 forks source link

Error while trying to retrieve faers data by running the pipeline.py #119

Closed evgakis closed 4 years ago

evgakis commented 4 years ago

Running the script all-pipelines.sh which has the following code

`#!/bin/bash

set -x

export LUIGI_CONFIG_PATH=./config/luigi.cfg export PYTHON=./_python-env/bin/python export LOGDIR=./logs

mkdir -p $LOGDIR

$PYTHON openfda/faers/pipeline.py LoadJSON --quarter=all > $LOGDIR/faers.log 2>&1`

I get the following error

Running: aws --cli-read-timeout=3600 --profile=openfda s3 sync s3://openfda-data-spl/data/ ./data/spl/s3_sync ERR: fatal error: Connect timeout on endpoint URL: "https://openfda-data-spl.s3.s3-website-us-east-1.amazonaws.com/?list-type=2&prefix=data%2F&encoding-type=url" Traceback (most recent call last): File "openfda/faers/pipeline.py", line 214, in <module> luigi.run() File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/interface.py", line 210, in run return _run(*args, **kwargs)['success'] File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/interface.py", line 238, in _run return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory) File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/interface.py", line 197, in _schedule_and_run success &= worker.run() File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/worker.py", line 872, in run self._handle_next_task() File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/worker.py", line 783, in _handle_next_task self._email_task_failure(task, expl) File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/worker.py", line 511, in _email_task_failure headline="A task failed when running. Most likely run() raised an exception.", File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/worker.py", line 517, in _email_error notifications.send_error_email(formatted_subject, message, task.owner_email) File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/notifications.py", line 298, in send_error_email recipients=recipients File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/notifications.py", line 268, in send_email email_sender(config, sender, subject, message, recipients, image_png) File "/home/vgakis/openfda/_python-env/lib/python2.7/site-packages/luigi/notifications.py", line 139, in send_email_smtp smtp = smtplib.SMTP(**kwargs) if not smtp_ssl else smtplib.SMTP_SSL(**kwargs) File "/usr/lib/python2.7/smtplib.py", line 256, in __init__ (code, msg) = self.connect(host, port) File "/usr/lib/python2.7/smtplib.py", line 316, in connect self.sock = self._get_socket(host, port, self.timeout) File "/usr/lib/python2.7/smtplib.py", line 291, in _get_socket return socket.create_connection((host, port), timeout) File "/usr/lib/python2.7/socket.py", line 575, in create_connection raise err socket.error: [Errno 111] Connection refused

Any idea why an how I could solve it? Maybe it has to do with aws config? I set s3-website-us-east-1 in region field

Before setting the aws config I got the following error

Running: aws --cli-read-timeout=3600 --profile=openfda s3 sync s3://openfda-data-spl/data/ ./data/spl/s3_sync ERR: The config profile (openfda) could not be found

dkrylovsb commented 4 years ago

At this point the pipeline in question requires access to a data source that is only available to FDA and thus cannot be successfully executed by the general public.

hobochili commented 4 years ago

@dkrylovsb Will the spl data ever be made available to the public? If not, is there a way to compile it independently? I'm a bit confused as it seems like it was previously available to anybody. Was there a policy change?

dkrylovsb commented 4 years ago

@hobochili The SPL data are available at https://labels.fda.gov/ as well as at https://dailymed.nlm.nih.gov/dailymed/spl-resources.cfm ("bulk" downloads). The pipeline, however, is set up to pull the labels from a private S3 bucket.