Miserlou / lambda-packages

Various popular python libraries, pre-compiled to be compatible with AWS Lambda
https://blog.zappa.io
731 stars 163 forks source link

geolibs #12

Open tlpriest opened 8 years ago

tlpriest commented 8 years ago

Anyone working on postgis geolibs?

Miserlou commented 8 years ago

Not that I know of, but that'd certainly a valuable addition for server-less GIS work!

monkut commented 6 years ago

I'm interested in using PostGIS with Postgres and Django.
I may try to get this going... So roughly I need to spin up amazon linux and install postgis? grab the resulting binaries and tar zip it up?

https://postgis.net/install/

erindan commented 6 years ago

I've been chatting with @monkut on the Slack channel and have had success setting up and running postgis extensions with Zappa. I'm happy to contribute a lambda-package in the next week or so, but may need a bit of guidance.

erindan commented 6 years ago

So, I've started looking into this and could use a bit of advice. It's my first attempt at providing a package and despite years in software development, am also quite new to making OSS contributions!

PostGIS extensions with Django have 3 separate dependencies: GEOS, GDAL and PROJ.4. I have all of them working in my own Zappa deployments by simply placing the *.so files in the root of my project (/var/task/ upon deployment) and setting GDAL_LIBRARY_PATH = '/var/task/libgdal.so' GEOS_LIBRARY_PATH = '/var/task/libgeos_c.so.1' in the Django settings.py file. It is also necessary to include a GDAL data folder (it contains translations between coordinate systems) in the project root and to set GDAL_DATA = '/var/task/data/' in the AWS Environment Variables settings. When all the above is in place, Django knows how to use the dependencies if the project has specified 'ENGINE': 'django.contrib.gis.db.backends.postgis' and included the GIS apps 'django.contrib.gis' and 'rest_framework_gis'.

With all that in mind I have a few questions:

Thank you!

taoru commented 6 years ago

We implemented it in a bit different way: instead of creating a "fat" handler, we kept it "slim" (in Zappa terms) and added s3:// URL parsing capability to include handler. During cold startup, it downloads the libs and rewrites the paths.

erindan commented 6 years ago

@taoru I'd be interested in learning a bit more about how you did that. The GDAL and GEOS libs are pretty big - 70MB uncompressed. This could be a pain with my approach on a connection with slow upload speeds. I'm also a bit worried about adding stuff to lambda-packages that will bloat everyone's deployments per the last question in my previous post.

taoru commented 6 years ago

@erindan I'll try to find time this week for a better example or perhaps even a pull request, but the rough idea is to change LambdaHandler to check if library in the for library in included_libraries loop starts with s3:// and downloading it first, then use cdll.LoadLibrary to load it as if it were always there. Lambda to S3 speeds are very fast (even faster since a few days ago), so it takes just a few sec to load everything. Still, the developer needs to link the appropriate wheel or .so libs and upload them to S3 first.

My use case was similar to yours, our core team was in China, and every zappa update took 7-12 minutes at least, sometimes just hanging altogether halfway. Having postgis libs prebuilt in this repo would also be great, though you have more fine-grained control over features when building yourself (my lib archive ended up being just around 70MB).

Though I personally think it would be better to make the whole dependency optional directly in Django, and leave the heavy-lifting to Postgres. But it's considerably more work than building the libs. :)

hammadzz commented 6 years ago

Is the simplest current solution what @erindan is doing currently?

Is it good enough for small scale production use? If it isn't good enough maybe I will drop Zappa for now and come back to it in a couple of months.

hammadzz commented 6 years ago

@erindan any chance you can share more on how you generated the compatible files and the names of the ones you added to the root directory of the Zappa project. I assume you used an ec2 instance or docker image? Did you install the libraries there and copy them into your project?

erindan commented 6 years ago

@hammadzz Yes I used a Docker setup based on the instructions here: https://edgarroman.github.io/zappa-django-guide/setup/ and built them per the instructions in the Django docs https://docs.djangoproject.com/en/2.0/ref/contrib/gis/install/geolibs/

erindan commented 6 years ago

lib_struct

erindan commented 6 years ago

The above is what my resulting project structure roughly looks like after building the .so files and copying them into my project root (bcomes /var/task upon deployment).

hammadzz commented 6 years ago

@erindan any chance you have a script for it? As on the Amazon Linux AMI I can't manage to build GEOS. I will eventually figure it out. Also I have no idea where the libraries are being installed and where I would find the *.so files. I have been so used to using package managers to install these things.

EDIT: I will be back with build scripts

hammadzz commented 6 years ago

This works to generate then files. Going to try them in Zappa now. I just install all three libraries on an amazon linux ec2 instance and tar gzip the usr/local/lib directory that has all then *.so files.

https://gist.github.com/hammadzz/b763e7ec5dd9c83c1855e10e654648e4

I forgot to grab the goal data folder, not exactly sure where it is.

mwalker commented 6 years ago

The GDAL data directory is in /usr/local/share.

Using your gist I have also been able to build all the libraries, and I have also built Spatialite. It took a lot of fiddling to get the libraries loaded with the correct names so they could find each other, and so that the python code within GeoDjango could also find them.

I'm unsure on how to proceed with packaging these for lambda-packages however, as there are a bunch of dependencies, and none of this code is python code, it resembles more how sqlite3 is included for python3.

If I name the library files:

libfreexl.so.1  libgdal.so  libgeos-3.4.2.so  libgeos_c.so.1  libproj.so.9  libspatialite.so

And use the following in the Django settings.py

if 'SERVERTYPE' in os.environ and os.environ['SERVERTYPE'] == 'AWS Lambda':
    GDAL_LIBRARY_PATH = '/var/task/libgdal.so'
    GEOS_LIBRARY_PATH = '/var/task/libgeos_c.so.1'
    SPATIALITE_LIBRARY_PATH='/var/task/libspatialite'

Then I am able to get it to work (with an altered zappa-django-utils for Spatialite, see https://github.com/mwalker/zappa-django-utils/tree/spatialite) on Lambda, but I haven't done a lot of testing yet, and I haven't sorted out the GDAL data directory either.

As these are libraries for python code that may already be included in the project (django.contrib.gis) what do you think is the best way to package them up and have Zappa include them in the build?

hammadzz commented 6 years ago

@mwalker for your gdal data directory add a environment variable pointing to it, you can do it in your zappa_settings

  aws_environment_variables:
    GDAL_DATA: "/var/task/gdal_data/"
mwalker commented 6 years ago

Thanks @hammadzz I figured it would be something like that. I will add it in and do some more testing.

giovannicimolin commented 6 years ago

Any news on this?

I'm trying to build my own, however GDAL doesn't seem to find libgeos_c.so.1:

OSError: libgeos_c.so.1: cannot open shared object file: No such file or directory

Despite having set GDAL and GEOS paths on the settings:

if 'SERVERTYPE' in os.environ and os.environ['SERVERTYPE'] == 'AWS Lambda':
    GEOS_LIBRARY_PATH = '/tmp/spotwayweb/lib/libgeos_c.so.1'
    GDAL_LIBRARY_PATH = '/tmp/spotwayweb/lib/libgdal.so'

Did any of you had this problem?

giovannicimolin commented 6 years ago

Also, these are my /lib folder contents:

libgdal.so libgeos-3.5.1.so libgeos_c.so.1 libproj.so libproj.so.9 libproj.so.9.0.0

And this is my zappa_settings.json:

{
    "dev": {
        "aws_region": "us-east-1",
        "django_settings": "App.settings.production",
        "profile_name": "default",
        "project_name": "spotwayweb",
        "runtime": "python2.7",
        "s3_bucket": "zappa-er0hnd366",
        "slim_handler": true
    }
}
aster1sk commented 6 years ago

Here's a gist describing how to build the dependencies in docker and copy to your path including the settings.py.

It makes a mess of your repo but it does the trick.

https://gist.github.com/aster1sk/7614356ec706c7244d155dc034a401e1

jtszalay commented 5 years ago

Here's a gist describing how to build the dependencies in docker and copy to your path including the settings.py.

It makes a mess of your repo but it does the trick.

https://gist.github.com/aster1sk/7614356ec706c7244d155dc034a401e1

I found that I needed the following settings with the slim handler

# Nasty hack to load the C extensions used for GeoDjango / PostGIS
    from ctypes import CDLL
    CDLL(f"{BASE_DIR}/geo/lib/libgeos-3.6.2.so")
    CDLL(f"{BASE_DIR}/geo/lib/libproj.so.12")
    CDLL(f"{BASE_DIR}/geo/lib/libgeos_c.so.1")
    CDLL(f"{BASE_DIR}/geo/lib/libjpeg.so.62")
    CDLL(f"{BASE_DIR}/geo/lib/libopenjp2.so.2.3.0")
    CDLL(f"{BASE_DIR}/geo/lib/libpng12.so.0")
    # end hack
    GDAL_LIBRARY_PATH = f"{BASE_DIR}/geo/lib/libgdal.so.2.2.2"
    GEOS_LIBRARY_PATH = f"{BASE_DIR}/geo/lib/libgeos_c.so.1.10.2"
    GDAL_DATA = f"{BASE_DIR}/geo/share/gdal"
    GDAL_CONFIG = f"{BASE_DIR}/geo/bin/gdal-config"
ghost commented 5 years ago

I'm getting errors running the docker-compose up with the above gist info in Linux. Failing with:

create .: volume name is too short, names should be at least two alphanumeric characters

khamaileon commented 5 years ago

You can find those libs in some Python geo packages. Example with rasterio:

  1. find library paths:
    zappa invoke production "import subprocess; print(subprocess.getoutput(['find / -name *libgdal*']))" --raw
    zappa invoke production "import subprocess; print(subprocess.getoutput(['find / -name *libgeos*']))" --raw
  2. add thoses env variables to your Zappa project:
    "GDAL_LIBRARY_PATH": "/var/task/rasterio/.libs/libgdal-c9384152.so.20.5.0",
    "GEOS_LIBRARY_PATH": "/var/task/rasterio/.libs/libgeos_c-595de9d4.so.1.10.2",
  3. put this in your Django config:
    GDAL_LIBRARY_PATH = os.environ['GDAL_LIBRARY_PATH']
    GEOS_LIBRARY_PATH = os.environ['GEOS_LIBRARY_PATH']