OpenGeoMetadata / GeoCombine

A Ruby toolkit for managing geospatial metadata
https://github.com/OpenGeoMetadata/GeoCombine
Other
21 stars 24 forks source link

Specify arbitrary solr location for "geocombine index" #41

Closed thatbudakguy closed 8 years ago

thatbudakguy commented 8 years ago

I've installed the "full" stack using the docker images provided here running in a single Amazon EC2 instance. I've found the address of the container running solr (listed in the geoblacklight container's hosts file) but don't know how to tell geocombine that I'm not running solr locally at 127.0.0.1:8983, and provide a different location instead. I can see that "geo_combine.rake" specifies 127.0.0.1, but I don't have the file locally after following the install and can't change the IP.

Apologies if the question is overly simplistic; I have little experience with ruby or docker & am just trying to get a feel for how the pieces fit together. Any guidance would be appreciated.

mejackreed commented 8 years ago

Thanks for the question! You can provide the environmental variable SOLR_URL to the rake task to index with a different Solr url. It's not in the documentation, but should be added.

mejackreed commented 8 years ago

@thatbudakguy See #42 for an example of how this can be done from the command line

thatbudakguy commented 8 years ago

Thanks for the example! Unfortunately the index task still isn't completing. This may be an issue specific to running the docker containers, but when I try to point SOLR_URL to the address of my solr docker container, the index task returns: RSolr::Error::Http - 404 Not Found Error: Not Found

I can see in the geoblacklight container's /etc/hosts file the address of the solr container (172.17.0.5), so I specify: SOLR_URL=http://172.17.0.5:8983/solr/collection rake geocombine:index

I also get a 404 when trying the path indicated in geo_combine.rake: SOLR_URL=http://172.17.0.5:8983/solr/blacklight-core rake geocombine:index

Same problem with bundle exec. A portscan confirms 8983 is open on the target (but 80 is not...does it need to be?). Also confirmed that the instance has internet access with a quick apt-get. Any ideas what is wrong?

thatbudakguy commented 8 years ago

Rather than specifying the absolute url, I tried the SOLR_URL passed to the container specified in the "full" docker-compose.yml file: SOLR_URL=http://solr:8983/solr/geoblacklight rake geocombine:index This returns a 400 Bad Request, with what looks like some reference to a shapefile:

RSolr::Error::Http - 400 Bad Request
Error: {'responseHeader'=>{'status'=>400,'QTime'=>61},'error'=>{'msg'=>'Couldn\'t parse shape \'ENVELOPE(55.0, 5.0, -107.0, -47.0)\' because: com.spatial4j.core.exception.InvalidShapeException: maxY must be >= minY: -47.0 to -107.0','code'=>400}}
mejackreed commented 8 years ago

Ah.. So that means that Solr is now trying to parse the metadata. That's a better error!

I've found that some of the metadata is not parseable due to incorrect bounds. Maybe that md is coming from https://github.com/OpenGeoMetadata/edu.princeton.arks/blob/a1b3e2df1b41f0c66cd68fe6fda625642bf2b5ed/th/83/m1/23/j/geoblacklight.json ?

Do you have records in your index now though?

thatbudakguy commented 8 years ago

I do indeed have many new records! It seems you are right, the full trace on that 400 shows it getting hung up on the bounds of a map from Princeton:

RSolr::Error::Http - 400 Bad Request
Error: {'responseHeader'=>{'status'=>400,'QTime'=>61},'error'=>{'msg'=>'Couldn\'t parse shape \'ENVELOPE(55.0, 5.0, -107.0, -47.0)\' because: com.spatial4j.core.exception.InvalidShapeException: maxY must be >= minY: -47.0 to -107.0','code'=>400}}

URI: http://solr:8983/solr/geoblacklight/update?wt=ruby&commitWithin=500&overwrite=true
Request Headers: {"Content-Type"=>"application/json"}
Request Data: "[{\"uuid\":\"http://arks.princeton.edu/ark:/88435/th83m123j\",\"dc_creator_sm\":[\"Condet, J.\",\"Cóvens, Jean.\",\"Mortier, Corneille.\",\"L'Isle, Guillaume de,\",\"Fox, N. B.,\"],\"dc_description_s\":\"A map of the British Empire in America : with the French, Spanish and Hollandish settlements adjacent thereto\",\"dc_format_s\":\"Scanned Map\",\"dc_identifier_s\":\"th83m123j\",\"dc_rights_s\":\"Public\",\"dct_provenance_s\":\"Princeton\",\"dct_references_s\":\"{\\\"http://www.loc.gov/standards/marcxml\\\":\\\"https://geowebservices.princeton.edu/download/items/th83m123j/marc.xml\\\",\\\"http://iiif.io/api/image\\\":\\\"https://geowebservices.princeton.edu/iiif/pulmap%2Fth%2F83%2Fm1%2F23%2F00000001.jp2%2Finfo.json\\\"}\",\"layer_id_s\":\"th83m123j\",\"layer_slug_s\":\"princeton-th83m123j\",\"layer_geom_type_s\":\"Image\",\"layer_modified_dt\":\"2015-01-23T17:48:34Z\",\"dc_title_s\":\"A map of the British Empire in America : with the French, Spanish and Hollandish settlements adjacent thereto\",\"dc_type_s\":\"Image\",\"dct_spatial_sm\":[\"Great Britain\",\"North America\",\"France\",\"Netherlands\",\"Spain\"],\"dct_temporal_sm\":[\"1741\"],\"georss_box_s\":\"-47.0 55.0 -107.0 5.0\",\"georss_polygon_s\":\"-107.0 55.0 -107.0 5.0 -47.0 5.0 -47.0 55.0 -107.0 55.0\",\"solr_geom\":\"ENVELOPE(55.0, 5.0, -107.0, -47.0)\",\"solr_year_i\":\"1741\"}]"

If I can do anything to help, let me know - thanks so much for the quick responses!

mejackreed commented 8 years ago

Nice. It seems we most likely need to enhance the way that bulk indexing is done. This rake task was really just meant as a proof of concept until we could figure out a better approach. Currently we (Stanford) use our own custom code to do indexing, but would like to move to a community based solution.