foursquare / fsqio

A monorepo that holds all of Foursquare's opensource projects
Apache License 2.0
252 stars 54 forks source link

twofishes: export polygons #42

Open mfinelli opened 7 years ago

mfinelli commented 7 years ago

Hi, I've followed this: https://github.com/foursquare/fsqio/blob/master/src/jvm/io/fsq/twofishes/docs/twofishes_inputs.md#polygons and put my shape files into data/private/polygons/

When I run the build with ./src/jvm/io/fsq/twofishes/scripts/parse.py --world --output_s2_covering_index --output_s2_interior_index --output_prefix_index --output_revgeo_index --yes-i-am-sure it uses significantly more disk space than without the shapefiles in place, however the resulting indexes are no bigger.

Is there some other step I need to run to export the polygons as well?

boriskozak commented 7 years ago

Hey Twofishes team -

Is there any update on this? Prior to moving to the Pants build system, I was able to get the shapefiles properly built. However, now the geometry directory is full of files close to 0bytes.

jglesner commented 6 years ago

Not sure if this is helpful, but I was able to get the reverse index built earlier this week for -c US from the polygons I created following the instructions. Getting the shapefile built doesn't involve using the pants build system, but building the s2\revgeo index from the shapefile does obviously.

Would suggest you validate the shapefile (.shp) by looking at it in a GIS tool (e.g. QGIS https://www.qgis.org). I placed the shapefile (shp, shx, cpg, dbf) into the fsqio/src/jvm/io/fsq/twofishes/indexer/data/private/polygons/ directory, and used the --output_revgeo_index argument as you mentioned on parse.py. At the end of the build process, you should see several "RevGeoIndex" metrics in json format. If you don't see the metrics at the end of the parse.py job, the build didn't complete successfully. Assuming you see those metrics, you should be able to serve it up and test it using http://localhost:8081/?ll=40.74,-74.0.