Novartis / cellxgene-gateway

Cellxgene Gateway allows you to use the Cellxgene Server provided by the Chan Zuckerberg Institute (https://github.com/chanzuckerberg/cellxgene) with multiple datasets.
Apache License 2.0
53 stars 32 forks source link

Deploy using Gunicorn - DATA LOCATION MISSING #92

Open aniseedDB opened 7 months ago

aniseedDB commented 7 months ago

Hello, I've been trying to deploy cellxgene-gateway using a python WSGI HTTP server like Gunicorn, since Flask only provides a dev server. I am also using nginx as a reverse proxy to forward client requests. The problem is not coming from nginx (since with nginx only, everything works correctly).

Here's the issue: Whenever I launch cellxgene-gateway with gunicorn, I cannot get the filecrawler to see the .h5ad files in the repository. I tried changing directories, changing permissions and ownerships of the repositories (I'm on debian), exporting the variables in different ways (using the --env flag of gunicorn, using the raw_env in the config file, exporting them beforehand then importing them using os.environ, hardcoding the string values in ALL the files of cellxgene-gateway where CELLxGENE_DATA is called...). None of these worked, and I still cannot view the data files. The app launches and no error messages are shown (not even in the log in debug mode). .

Does anyone have any ideas or a solution to this problem? I'm happy to provide any further information about my setup if needed.

aniseedDB commented 7 months ago

Tracing the error back to filecrawl.py where the render functions seem to rely on flask_util. It may be a code issue (func using flask) rather than a config issue (env var). I will recode these functions to remove all flask dependencies. In the interest of time, any help is welcome.

alokito commented 7 months ago

I don't have too much time to look into this, but I asked ChatGPT, perhaps this will work for you? I suspect you will need to tweak the view names.

include_source_in_url = False
try:
    from flask import request, url_for

    def querystring():
        qs = request.query_string.decode()
        return f"?{qs}" if len(qs) > 0 else ""

    def url(endpoint, descriptor, source_name):
        if include_source_in_url:
            return url_for(endpoint, source_name=source_name, path=descriptor)
        else:
            return url_for(endpoint, path=descriptor)

    def view_url(descriptor, source_name):
        return url("do_view", descriptor, source_name)

    def relaunch_url(descriptor, source_name):
        return url("do_relaunch", descriptor, source_name)

except ImportError:
    import os

    def querystring(environ=None):
        if environ is None:
            environ = os.environ
        qs = environ.get('QUERY_STRING', '')
        return f"?{qs}" if qs else ""

    def url(endpoint, descriptor, source_name):
        if include_source_in_url:
            return f"/{endpoint}{querystring()}&path={descriptor}&source_name={source_name}"
        else:
            return f"/{endpoint}{querystring()}&path={descriptor}"

    def view_url(descriptor, source_name):
        return url("do_view", descriptor, source_name)

    def relaunch_url(descriptor, source_name):
        return url("do_relaunch", descriptor, source_name)

If your app is not located at the server root, you may also need to consult some other environment variables. ChatGPT suggested SCRIPT_NAME,

def get_application_path(environ):
    script_name = environ.get('SCRIPT_NAME', '')
    return script_name.rstrip('/')  # Remove trailing slash, if any

# Example usage:
# Assuming 'environ' is the WSGI environment passed to your application
app_path = get_application_path(environ)
print(app_path)