FDA / openfda

openFDA is an FDA project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.
https://open.fda.gov
Creative Commons Zero v1.0 Universal
569 stars 131 forks source link

Instructions #149

Closed Mariano215 closed 3 years ago

Mariano215 commented 3 years ago

I'm still having a hard time getting this running. I'm new to this so forgive me but...

I'm running an Ubuntu sever. I installed all the prerequisites. I ran bootstrap.sh. I ran docker-compose up (many warnings)

How exactly do I get data into elasticsearch from here?

dkrylovsb commented 3 years ago

docker-compose up will eventually get the data into the Elasticsearch container. Use docker-compose logs -f pipeline to follow progress of the pipelines. Use curl http://localhost:8000/status to see which endpoints are ready for querying.

If you need to talk to Elasticsearch directly, modify docker-compose.yml to expose port 9200 to your host and then use Elasticsearch API to communicate.

Feel free to reach out if you run into trouble.

Mariano215 commented 3 years ago
  1. I have the status as green for foodevent, othernsde, and othersubstance.
  2. The curl status states "progress looks ;)" - no errors indicated.
  3. My docker-compose.yml - from what I've read should already create the indices in elasticsearch. Yet I find none. Can you tell me what I'm missing?
  4. I'm interested in the 510k and Device Adverse Events data. a. When I try to run pipeline.py from either of these directories I get "Can't input 'parallel' from 'openfda' error.

On Thu, Dec 17, 2020 at 8:04 AM Denis Krylov notifications@github.com wrote:

docker-compose up will eventually get the data into the Elasticsearch container. Use docker-compose logs -f pipeline to follow progress of the pipelines. Use curl http://localhost:8000/status to see which endpoints are ready for querying.

If you need to talk to Elasticsearch directly, modify docker-compose.yml https://github.com/FDA/openfda-dev/blob/master/docker-compose.yml#L9 to expose port 9200 to your host and then use Elasticsearch API to communicate.

Feel free to reach out if you run into trouble.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/FDA/openfda/issues/149#issuecomment-747426778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJHB2ZGOI3XRQJMDGFEELT3SVH6WVANCNFSM4U6UZEYA .

--

Mariano 267-746-0682 iPhone/txt http://onenightfeature.com http://www.matteifamilypizza.com http://www.marianomattei.com http://www.getrealized.com http://www.facebook.com/LaCucinaDiGigi

dkrylovsb commented 3 years ago
  1. You probably need to pull the latest master and re-run. We added two additional pipelines, so you should see more than 3 green.
  2. 👍
  3. The indices are inside the Docker container. You can log into the container and inspect the indices as follows:
    bash-3.2$ docker exec -it openfda-dev_es_1 /bin/bash
    [elasticsearch@f8fc5c1b6149 ~]$ ls /usr/share/elasticsearch/data

    or better yet, use the ES API on port 9200.

  4. Take a look at scripts/all-pipelines.sh if you want to run the pipelines outside of the compose setup.
Mariano215 commented 3 years ago

No luck. I've removed openfda and cloned it again. Same exact results.

sudo curl http://localhost:8000/status [ { "endpoint": "foodevent", "status": "GREEN", "last_updated": "2020-07-28", "documents": 91776, "requests": 2, "latency": 67 }, { "endpoint": "othernsde", "status": "GREEN", "last_updated": "2020-12-16", "documents": 468748, "requests": 2, "latency": 86.66666666666667 }, { "endpoint": "othersubstance", "status": "GREEN", "last_updated": "2020-12-11", "documents": 119479, "requests": 1, "latency": 27 } ]

If I run : sudo python3 openfda/device_clearance/pipeline.py LoadJSON

I get: traceback (most recent call last): File "openfda/device_clearance/pipeline.py", line 16, in from openfda import common, config, index_util, parallel File "/usr/local/lib/python3.8/dist-packages/openfda-1.0-py3.8.egg/openfda/index_util.py", line 23, in from openfda import config, elasticsearch_requests, parallel ImportError: cannot import name 'parallel' from 'openfda' (/usr/local/lib/python3.8/dist-packages/openfda-1.0-py3.8.egg/openfda/init.py)

dkrylovsb commented 3 years ago

I'm happy to look at your log files: docker-compose logs --tail="all"

Mariano215 commented 3 years ago

docker-logs.txt

I downloaded the master zip today and it seems to have changed. I attached the logs.

Also - now this is happening: sudo curl http://localhost:8000/status curl: (56) Recv failure: Connection reset by peer

I'd be more than happy to setup a teams meeting as I feel I'm missing some key concepts here and can't seem to get this working.

dkrylovsb commented 3 years ago

The timestamps in the log file provided are from 2020-12-08.

Mariano215 commented 3 years ago

pipeline-logs.txt

docker-logs.zip

Mariano215 commented 3 years ago

Update....

I have recreated my Ubuntu virtual server and started from scratch. I am still receiving errors but I believe the docker image is up.

Is it possible to use Kibana to access this data?

I can't seem to get it running outside the docker container.

curl http://localhost:8000/status [ { "endpoint": "deviceclearance", "status": "GREEN", "last_updated": "2020-12-14", "documents": 157716, "requests": 11, "latency": 4.833333333333333 }, { "endpoint": "foodevent", "status": "GREEN", "last_updated": "2020-07-28", "documents": 91776, "requests": 16, "latency": 8.529411764705882 }, { "endpoint": "othernsde", "status": "GREEN", "last_updated": "2020-12-19", "documents": 469657, "requests": 15, "latency": 10.875 }, { "endpoint": "othersubstance", "status": "GREEN", "last_updated": "2020-12-11", "documents": 119479, "requests": 14, "latency": 4.733333333333333 } ]

curl -g 'http://localhost:8000/device/510k.json?search=advisory_committee:cv&limit=1' { "meta": { "disclaimer": "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service.", "terms": "https://open.fda.gov/terms/", "license": "https://open.fda.gov/license/", "last_updated": "2020-12-14", "results": { "skip": 0, "limit": 1, "total": 17143 } }, "results": [ { "third_party_flag": "N", "city": "BURLINGTON", "advisory_committee_description": "Cardiovascular", "address_1": "164 MIDDLESEX TURNPIKE", "address_2": "", "statement_or_summary": "Statement", "product_code": "DXE", "openfda": { "device_name": "Catheter, Embolectomy", "registration_number": [ "9617465", "2015691", "1319639", "2183870", "3005704822", "3009500972", "1225687", "1220477", "3013666218", "2518433", "8043983", "1000393132", "3015443148", "3005168196", "3010034260", "3008307705", "3010041511", "9680794", "2030598", "3015045258", "2134812", "2522007", "3007284006", "2032521", "2029386", "3008114965", "3013758550", "2032098", "1721504", "3016591327", "1721686", "3007146453", "2134914", "3004832480", "2134265", "3009756153", "2027111", "1222313", "1048735", "1820334", "2024311", "2011171", "3015615738", "2183930", "2183744", "3009761573", "3009490946", "3015614177", "3015859709", "2648045", "1036844", "2022435", "3008496528", "3011525976", "1061124", "3010425778", "3013875781", "3008847191", "3007282893", "2030624", "1724474", "3009051888", "3011137372", "1721676", "1220948", "1220452", "3012102437" ], "fei_number": [ "2015691", "2183870", "3005704822", "3009500972", "3005747797", "1220477", "3003094851", "3013666218", "3001451463", "2518433", "1000393132", "3015443148", "1000518731", "3010034260", "3005168196", "3003058448", "3008307705", "3010041511", "3015045258", "2522007", "3007284006", "3008114965", "3000206435", "3013758550", "1000138054", "1000306647", "1721504", "3016591327", "1721686", "3007146453", "3004832480", "3003574398", "3003737899", "3009756153", "2027111", "2024311", "1048735", "1820334", "2011171", "3015615738", "2183744", "3009761573", "1000121050", "3009490946", "3003769549", "3002827704", "3015614177", "3015859709", "2648045", "1036844", "2022435", "3008496528", "3011525976", "3002095335", "3010425778", "3013875781", "1000512168", "3008847191", "3002806593", "3007282893", "3002693767", "3009051888", "1000116127", "1721676", "1220948", "1220452", "3012102437" ], "medical_specialty_description": "Cardiovascular", "device_class": "2", "regulation_number": "870.5150" }, "zip_code": "01803", "applicant": "VASCUTECH, INC.", "decision_date": "2000-03-08", "decision_code": "SESE", "country_code": "US", "device_name": "LEMAITRE IRRIGATION CATHETER", "advisory_committee": "CV", "contact": "TRENT G KAMKE", "expedited_review_flag": "", "k_number": "K992933", "state": "MA", "date_received": "1999-08-31", "review_advisory_committee": "CV", "postal_code": "01803", "decision_description": "Substantially Equivalent", "clearance_type": "Traditional" } ] }

Mariano215 commented 3 years ago

I believe I have it all up and running! Thanks for your help. I can now see : deviceclearance, deviceevent indicies.

dkrylovsb commented 3 years ago

Glad to hear!