Closed thatbudakguy closed 8 years ago
Thanks for the question! You can provide the environmental variable SOLR_URL
to the rake task to index with a different Solr url. It's not in the documentation, but should be added.
@thatbudakguy See #42 for an example of how this can be done from the command line
Thanks for the example! Unfortunately the index task still isn't completing. This may be an issue specific to running the docker containers, but when I try to point SOLR_URL to the address of my solr docker container, the index task returns:
RSolr::Error::Http - 404 Not Found Error: Not Found
I can see in the geoblacklight container's /etc/hosts file the address of the solr container (172.17.0.5), so I specify:
SOLR_URL=http://172.17.0.5:8983/solr/collection rake geocombine:index
I also get a 404 when trying the path indicated in geo_combine.rake:
SOLR_URL=http://172.17.0.5:8983/solr/blacklight-core rake geocombine:index
Same problem with bundle exec
. A portscan confirms 8983 is open on the target (but 80 is not...does it need to be?). Also confirmed that the instance has internet access with a quick apt-get
. Any ideas what is wrong?
Rather than specifying the absolute url, I tried the SOLR_URL passed to the container specified in the "full" docker-compose.yml file:
SOLR_URL=http://solr:8983/solr/geoblacklight rake geocombine:index
This returns a 400 Bad Request, with what looks like some reference to a shapefile:
RSolr::Error::Http - 400 Bad Request
Error: {'responseHeader'=>{'status'=>400,'QTime'=>61},'error'=>{'msg'=>'Couldn\'t parse shape \'ENVELOPE(55.0, 5.0, -107.0, -47.0)\' because: com.spatial4j.core.exception.InvalidShapeException: maxY must be >= minY: -47.0 to -107.0','code'=>400}}
Ah.. So that means that Solr is now trying to parse the metadata. That's a better error!
I've found that some of the metadata is not parseable due to incorrect bounds. Maybe that md is coming from https://github.com/OpenGeoMetadata/edu.princeton.arks/blob/a1b3e2df1b41f0c66cd68fe6fda625642bf2b5ed/th/83/m1/23/j/geoblacklight.json ?
Do you have records in your index now though?
I do indeed have many new records! It seems you are right, the full trace on that 400 shows it getting hung up on the bounds of a map from Princeton:
RSolr::Error::Http - 400 Bad Request
Error: {'responseHeader'=>{'status'=>400,'QTime'=>61},'error'=>{'msg'=>'Couldn\'t parse shape \'ENVELOPE(55.0, 5.0, -107.0, -47.0)\' because: com.spatial4j.core.exception.InvalidShapeException: maxY must be >= minY: -47.0 to -107.0','code'=>400}}
URI: http://solr:8983/solr/geoblacklight/update?wt=ruby&commitWithin=500&overwrite=true
Request Headers: {"Content-Type"=>"application/json"}
Request Data: "[{\"uuid\":\"http://arks.princeton.edu/ark:/88435/th83m123j\",\"dc_creator_sm\":[\"Condet, J.\",\"Cóvens, Jean.\",\"Mortier, Corneille.\",\"L'Isle, Guillaume de,\",\"Fox, N. B.,\"],\"dc_description_s\":\"A map of the British Empire in America : with the French, Spanish and Hollandish settlements adjacent thereto\",\"dc_format_s\":\"Scanned Map\",\"dc_identifier_s\":\"th83m123j\",\"dc_rights_s\":\"Public\",\"dct_provenance_s\":\"Princeton\",\"dct_references_s\":\"{\\\"http://www.loc.gov/standards/marcxml\\\":\\\"https://geowebservices.princeton.edu/download/items/th83m123j/marc.xml\\\",\\\"http://iiif.io/api/image\\\":\\\"https://geowebservices.princeton.edu/iiif/pulmap%2Fth%2F83%2Fm1%2F23%2F00000001.jp2%2Finfo.json\\\"}\",\"layer_id_s\":\"th83m123j\",\"layer_slug_s\":\"princeton-th83m123j\",\"layer_geom_type_s\":\"Image\",\"layer_modified_dt\":\"2015-01-23T17:48:34Z\",\"dc_title_s\":\"A map of the British Empire in America : with the French, Spanish and Hollandish settlements adjacent thereto\",\"dc_type_s\":\"Image\",\"dct_spatial_sm\":[\"Great Britain\",\"North America\",\"France\",\"Netherlands\",\"Spain\"],\"dct_temporal_sm\":[\"1741\"],\"georss_box_s\":\"-47.0 55.0 -107.0 5.0\",\"georss_polygon_s\":\"-107.0 55.0 -107.0 5.0 -47.0 5.0 -47.0 55.0 -107.0 55.0\",\"solr_geom\":\"ENVELOPE(55.0, 5.0, -107.0, -47.0)\",\"solr_year_i\":\"1741\"}]"
If I can do anything to help, let me know - thanks so much for the quick responses!
Nice. It seems we most likely need to enhance the way that bulk indexing is done. This rake task was really just meant as a proof of concept until we could figure out a better approach. Currently we (Stanford) use our own custom code to do indexing, but would like to move to a community based solution.
I've installed the "full" stack using the docker images provided here running in a single Amazon EC2 instance. I've found the address of the container running solr (listed in the geoblacklight container's hosts file) but don't know how to tell geocombine that I'm not running solr locally at 127.0.0.1:8983, and provide a different location instead. I can see that "geo_combine.rake" specifies 127.0.0.1, but I don't have the file locally after following the install and can't change the IP.
Apologies if the question is overly simplistic; I have little experience with ruby or docker & am just trying to get a feel for how the pieces fit together. Any guidance would be appreciated.