ga4gh / ga4gh-server

Reference implementation of the APIs defined in ga4gh-schemas. RETIRED 2018-01-24
http://ga4gh.org
Apache License 2.0
96 stars 93 forks source link

Fail to test Apache deployment with demo dataset #1563

Open wdesouza opened 7 years ago

wdesouza commented 7 years ago

I am trying to deploy and test GA4GH server at a fresh installation of Ubuntu 16.04. I did everything in Deployment on Apache documentation. When I tried to search for variants,

ga4gh_client variants-search http://localhost/ga4gh -r 1 -s 1 -e 5000

I got this error:

ERROR:ga4gh.client.client:500 {"errorCode": 1277500761, "message": "Internal Server Error"}
Traceback (most recent call last):
  File "/srv/ga4gh/ga4gh-server-env/bin/ga4gh_client", line 11, in <module>
    sys.exit(client_main())
  File "/srv/ga4gh/ga4gh-server-env/local/lib/python2.7/site-packages/ga4gh/client/cli.py", line 1682, in client_main
    raise exception
ga4gh.client.exceptions.RequestNonSuccessException: Url http://localhost/ga4gh/variants/search?key=invalid had status_code 500

The server-side log (/var/log/apache2/error.log):

[Fri Feb 10 04:21:04.950143 2017] [wsgi:error] [pid 6406:tid 140633860036352] ERROR:ga4gh.server.frontend:Exception on / [GET]
[Fri Feb 10 04:21:04.950147 2017] [wsgi:error] [pid 6406:tid 140633860036352] Traceback (most recent call last):
[Fri Feb 10 04:21:04.950148 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
[Fri Feb 10 04:21:04.950150 2017] [wsgi:error] [pid 6406:tid 140633860036352]     rv = self.dispatch_request()
[Fri Feb 10 04:21:04.950151 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
[Fri Feb 10 04:21:04.950153 2017] [wsgi:error] [pid 6406:tid 140633860036352]     return self.view_functions[rule.endpoint](**req.view_args)
[Fri Feb 10 04:21:04.950154 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/frontend.py", line 662, in searchVariants
[Fri Feb 10 04:21:04.950156 2017] [wsgi:error] [pid 6406:tid 140633860036352]     flask.request, app.backend.runSearchVariants)
[Fri Feb 10 04:21:04.950172 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/frontend.py", line 497, in handleFlaskPostRequest
[Fri Feb 10 04:21:04.950177 2017] [wsgi:error] [pid 6406:tid 140633860036352]     return handleHttpPost(flaskRequest, endpoint)
[Fri Feb 10 04:21:04.950183 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/frontend.py", line 363, in handleHttpPost
[Fri Feb 10 04:21:04.950184 2017] [wsgi:error] [pid 6406:tid 140633860036352]     responseStr = endpoint(request.get_data())
[Fri Feb 10 04:21:04.950186 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/backend.py", line 878, in runSearchVariants
[Fri Feb 10 04:21:04.950188 2017] [wsgi:error] [pid 6406:tid 140633860036352]     self.variantsGenerator)
[Fri Feb 10 04:21:04.950189 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/backend.py", line 585, in runSearchRequest
[Fri Feb 10 04:21:04.950191 2017] [wsgi:error] [pid 6406:tid 140633860036352]     for obj, nextPageToken in objectGenerator(request):
[Fri Feb 10 04:21:04.950192 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/backend.py", line 369, in variantsGenerator
[Fri Feb 10 04:21:04.950194 2017] [wsgi:error] [pid 6406:tid 140633860036352]     request, variantSet)
[Fri Feb 10 04:21:04.950196 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/paging.py", line 69, in __init__
[Fri Feb 10 04:21:04.950197 2017] [wsgi:error] [pid 6406:tid 140633860036352]     self._initialiseIteration()
[Fri Feb 10 04:21:04.950199 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/paging.py", line 90, in _initialiseIteration
[Fri Feb 10 04:21:04.950200 2017] [wsgi:error] [pid 6406:tid 140633860036352]     self._currentObject = next(self._searchIterator, None)
[Fri Feb 10 04:21:04.950202 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/datamodel/variants.py", line 739, in getVariants
[Fri Feb 10 04:21:04.950204 2017] [wsgi:error] [pid 6406:tid 140633860036352]     referenceName, startPosition, endPosition):
[Fri Feb 10 04:21:04.950205 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/datamodel/variants.py", line 720, in getPysamVariants
[Fri Feb 10 04:21:04.950207 2017] [wsgi:error] [pid 6406:tid 140633860036352]     cursor = self.getFileHandle(varFileName).fetch(
[Fri Feb 10 04:21:04.950208 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/datamodel/__init__.py", line 637, in getFileHandle
[Fri Feb 10 04:21:04.950210 2017] [wsgi:error] [pid 6406:tid 140633860036352]     return fileHandleCache.getFileHandle(dataFile, self.openFile)
[Fri Feb 10 04:21:04.950212 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/datamodel/__init__.py", line 85, in getFileHandle
[Fri Feb 10 04:21:04.950213 2017] [wsgi:error] [pid 6406:tid 140633860036352]     handle = openMethod(dataFile)
[Fri Feb 10 04:21:04.950215 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages/ga4gh/server/datamodel/variants.py", line 637, in openFile
[Fri Feb 10 04:21:04.950216 2017] [wsgi:error] [pid 6406:tid 140633860036352]     return pysam.VariantFile(dataUrl, index_filename=indexFile)
[Fri Feb 10 04:21:04.950218 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "pysam/cbcf.pyx", line 3273, in pysam.cbcf.VariantFile.__init__ (pysam/cbcf.c:48009)
[Fri Feb 10 04:21:04.950220 2017] [wsgi:error] [pid 6406:tid 140633860036352]   File "pysam/cbcf.pyx", line 3486, in pysam.cbcf.VariantFile.open (pysam/cbcf.c:50967)
[Fri Feb 10 04:21:04.950221 2017] [wsgi:error] [pid 6406:tid 140633860036352] IOError: file `ga4gh-example-data/chr1.vcf.gz` not found

It seems that server cannot find VCF file defined as relative path. Did I do something wrong?

david4096 commented 7 years ago

Hi @Welliton309 The problem here is the download_example_data script generates relative paths, while when you're running in apache the relative path you're running from is not your source folder.

One thing you can do is edit the registry to have the absolute path for those variant sets. Another option would be to edit scripts/download_example_data and changing the relative to absolute paths then rerunning it.

Best option would be to regenerate the registry using the repo CLI. I think we ought to enforce that absolute paths are always used in the registry, but the example data is a special case where we want it to be portable.

Maybe there is a better way to handle both cases? https://github.com/ga4gh/server/issues/1442

wdesouza commented 7 years ago

Thank you @david4096 for the response. In production environment I always use absolute path to create the repository. However, it would be useful if we have a example data to test server deployment configurations before exposing real data. I think the idea of packing example data would be a good solution.

wdesouza commented 7 years ago

I am testing the new release of the server (0.3.6) and the error related to example data was changed. Now it says that the registry.db file from example data (4.6) is malformed.

ga4gh_repo verify ga4gh-example-data/registry.db
/srv/ga4gh/ga4gh-server-env/bin/ga4gh_repo: error: Database file
'ga4gh-example-data/registry.db' is malformed.  Either change the
configuration to point to a valid file or create one using the repo
manager.

I guess this file is outdated. Is there a new version of example data to test?

david4096 commented 7 years ago

Yes, we'll need to update the example data!

david4096 commented 7 years ago

This issue will close when we've solved how to demonstrate properly with example data when running behind Apache and documenting it. The problem is that the Apache process needs to get the same base path as the code.

I think that if you change the WSGIDaemonProcess home path it might work:

WSGIDaemonProcess ga4gh \
    processes=10 threads=1 \
    python-path=/srv/ga4gh/ga4gh-server-env/lib/python2.7/site-packages \
    python-eggs=/var/cache/apache2/python-egg-cache \
    home=/srv/ga4gh
WSGIScriptAlias /ga4gh /srv/ga4gh/application.wsgi

From https://code.google.com/archive/p/modwsgi/wikis/ConfigurationDirectives.wiki#WSGIPythonPath

Defines an absolute path of a directory which should be used as the initial current working directory of the daemon processes within the process group.

If this option is not defined, in mod_wsgi 1.X the current working directory of the Apache parent process will be inherited by the daemon processes within the process group. Normally the current working directory of the Apache parent process would be the root directory. In mod_wsgi 2.0+ the initial current working directory will be set to be the home directory of the user that the daemon process runs as.

An alternative might be to switch to gunicorn https://github.com/ga4gh/ga4gh-server/pull/1607, which will allow the documentation to be simpler.