FDA / openfda

openFDA is an FDA project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.
https://open.fda.gov
Creative Commons Zero v1.0 Universal
569 stars 131 forks source link

How to set up local env? #115

Closed beckyconning closed 4 years ago

beckyconning commented 4 years ago

I'm working on the search_from stuff but I want to test it before opening a PR. Really not sure how to get started. I ran bootstrap but I'm getting

➜  openfda git:(master) ✗ node api/faers/api.js
{"name":"openfda-api-logger","hostname":"Beckys-MacBook-Pro.local","pid":27009,"level":30,"msg":"Adding connection to http://localhost:9200/","time":"2019-12-22T18:42:16.845Z","v":0}
Updating index information.
Listening on 8000
Failed to fetch index update times::  [illegal_argument_exception] request [/openfdametadata/last_run/_search] contains unrecognized parameter: [fields]
Unhandled rejection Error: [index_not_found_exception] no such index [deviceevent], with: {"resource.type":"index_or_alias","resource.id":"deviceevent","index_uuid":"_na_","index":"deviceevent"}
    at respond (/Users/beckyconning/openfda/api/faers/node_modules/elasticsearch/src/lib/transport.js:256:15)
    at checkRespForFailure (/Users/beckyconning/openfda/api/faers/node_modules/elasticsearch/src/lib/transport.js:219:7)
    at HttpConnector.<anonymous> (/Users/beckyconning/openfda/api/faers/node_modules/elasticsearch/src/lib/connectors/http.js:155:7)
    at IncomingMessage.wrapper (/Users/beckyconning/openfda/api/faers/node_modules/elasticsearch/node_modules/lodash/index.js:3095:19)
    at IncomingMessage.emit (events.js:203:15)
    at endReadableNT (_stream_readable.js:1145:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)
beckyconning commented 4 years ago

When I run setup.py develop I get.

error: elasticsearch 7.1.0 is installed but elasticsearch<2.0.0,>=1.3.0 is required by set(['pyelasticsearch'])
beckyconning commented 4 years ago

pip install elasticsearch==1.3.0 fixed the setup.py but i still get the original error.

beckyconning commented 4 years ago

running scripts/test_python.sh gives

======================================================================
ERROR: openfda.tests.test_index_util.test_fresh_index
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/beckyconning/openfda/openfda/tests/test_index_util.py", line 43, in test_fresh_index
    index_util.index_with_checksum(es, 'index_util_test1', doc_type, batch)
  File "/Users/beckyconning/openfda/openfda/index_util.py", line 292, in index_with_checksum
    fields='@checksum')['docs']:
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 385, in mget
    params=params, body=body)
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 89, in perform_request
    self._raise_error(response.status, raw_data)
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
TransportError: TransportError(406, u'Content-Type header [] is not supported')
-------------------- >> begin captured logging << --------------------
urllib3.util.retry: DEBUG: Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): localhost
urllib3.connectionpool: DEBUG: http://localhost:9200 "DELETE /index_util_test1 HTTP/1.1" 200 21
elasticsearch: INFO: DELETE http://localhost:9200/index_util_test1 [status:200 request:0.140s]
elasticsearch: DEBUG: > None
elasticsearch: DEBUG: < {"acknowledged":true}
urllib3.util.retry: DEBUG: Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
urllib3.connectionpool: DEBUG: http://localhost:9200 "PUT /index_util_test1 HTTP/1.1" 200 75
elasticsearch: INFO: PUT http://localhost:9200/index_util_test1 [status:200 request:0.375s]
elasticsearch: DEBUG: > None
elasticsearch: DEBUG: < {"acknowledged":true,"shards_acknowledged":true,"index":"index_util_test1"}
urllib3.util.retry: DEBUG: Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
urllib3.connectionpool: DEBUG: http://localhost:9200 "GET /index_util_test1/user_message/_mget?fields=%40checksum&_source=false HTTP/1.1" 406 64
elasticsearch: WARNING: GET /index_util_test1/user_message/_mget?fields=%40checksum&_source=false [status:406 request:0.001s]
elasticsearch: DEBUG: > {"ids": [0, 1, 2, 3]}
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: openfda.tests.test_index_util.test_replace_some_docs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/beckyconning/openfda/openfda/tests/test_index_util.py", line 54, in test_replace_some_docs
    test_fresh_index()
  File "/Users/beckyconning/openfda/openfda/tests/test_index_util.py", line 43, in test_fresh_index
    index_util.index_with_checksum(es, 'index_util_test1', doc_type, batch)
  File "/Users/beckyconning/openfda/openfda/index_util.py", line 292, in index_with_checksum
    fields='@checksum')['docs']:
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 385, in mget
    params=params, body=body)
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 89, in perform_request
    self._raise_error(response.status, raw_data)
  File "/Users/beckyconning/openfda/_python-env/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
TransportError: TransportError(406, u'Content-Type header [] is not supported')
-------------------- >> begin captured logging << --------------------
urllib3.util.retry: DEBUG: Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): localhost
urllib3.connectionpool: DEBUG: http://localhost:9200 "DELETE /index_util_test1 HTTP/1.1" 200 21
elasticsearch: INFO: DELETE http://localhost:9200/index_util_test1 [status:200 request:0.063s]
elasticsearch: DEBUG: > None
elasticsearch: DEBUG: < {"acknowledged":true}
urllib3.util.retry: DEBUG: Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
urllib3.connectionpool: DEBUG: http://localhost:9200 "PUT /index_util_test1 HTTP/1.1" 200 75
elasticsearch: INFO: PUT http://localhost:9200/index_util_test1 [status:200 request:0.433s]
elasticsearch: DEBUG: > None
elasticsearch: DEBUG: < {"acknowledged":true,"shards_acknowledged":true,"index":"index_util_test1"}
urllib3.util.retry: DEBUG: Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
urllib3.connectionpool: DEBUG: http://localhost:9200 "GET /index_util_test1/user_message/_mget?fields=%40checksum&_source=false HTTP/1.1" 406 64
elasticsearch: WARNING: GET /index_util_test1/user_message/_mget?fields=%40checksum&_source=false [status:406 request:0.001s]
elasticsearch: DEBUG: > {"ids": [0, 1, 2, 3]}
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 60 tests in 6.224s

FAILED (errors=2)
beckyconning commented 4 years ago

Hi!

I'm quite happy to change pagination to use search_after rather than from and size. This will make the API much more efficient and enable it to give query results higher than 25,000.

However I have no idea how to get my development environment set up. I'm happy to document how to set up the development environment too.

But I need help getting started please!

beckyconning commented 4 years ago

Still no luck! Spending a lot of my free time trying to work this out. I think I just need to know how to get the data into the Elasticsearch but I'm not entirely sure. Any help would be very appreciated!

beckyconning commented 4 years ago

Spent about 50 hours over the last month or so trying to work this out and I'm afraid I haven't been able to. I'd love some assistance or instructions on importing the data into elastic search.

beckyconning commented 4 years ago

In return I'm 100% happy to implement the faster search which will also allow for more than 25000 results.

evgakis commented 4 years ago

Spent about 50 hours over the last month or so trying to work this out and I'm afraid I haven't been able to. I'd love some assistance or instructions on importing the data into elastic search.

I have the same error. Could you please give an advice how could I fix it?

{"name":"openfda-api-logger","hostname":"DESKTOP-1BDVMTR","pid":9172,"level":30,"msg":"Adding connection to http://localhost:9200/","time":"2020-02-11T13:10:29.065Z","v":0} Updating index information. Listening on 9200 Failed to fetch index update times:: [index_not_found_exception] no such index, with: {"resource.type":"index_or_alias","resource.id":"openfdametadata","index":"openfdametadata"} Unhandled rejection Error: [index_not_found_exception] no such index, with: {"resource.type":"index_or_alias","resource.id":"deviceevent","index":"deviceevent"} ....

beckyconning commented 4 years ago

Hey @evgakis not sure yet. Spent quite a while trying to figure it out but no idea yet. If you work it out please let me know.

dkrylovsb commented 4 years ago

Setting up a local development environment isn't difficult (we'd be happy to assist), but running the pipelines and generating data is. Furthermore, as I mentioned in the other issue, Elasticsearch would require a major version upgrade before Search After could be implemented.

evgakis commented 4 years ago

@dkrylovsb I run the script run_faers_pipeline.sh and the raw data are downloaded but the process stops after that and the data are not extracted to json. I see the following error in the logs faers.log ERR: The config profile (openfda) could not be found Running: aws --cli-read-timeout=3600 --profile=openfda s3 sync s3://openfda-data-spl/data/ ./data/spl/s3_sync

I have tried to manually config the aws via aws configure --profile "openfda" but I think this is not the right way to go.

I think that aws should have been configured while running bootstrap.sh or running the npm install in api/faers folder Could you please give a help with that?

dkrylovsb commented 4 years ago

Unfortunately, that bucket is private and no data can be downloaded from it at this point. You could work around this issue, however, by replacing this line in the FAERS pipeline with:

return XML2JSON(self.quarter)

dkrylovsb commented 4 years ago

More information on how to run openFDA locally (select pipelines only) is here: https://github.com/FDA/openfda/pull/133