Esri / gis-tools-for-hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
http://esri.github.io/gis-tools-for-hadoop/
Apache License 2.0
519 stars 254 forks source link

trip-discovery sample not working - geometry library not loaded #18

Open trumboosahil opened 9 years ago

trumboosahil commented 9 years ago

Yes I run the sample using run-it.sh I edited the file TripCellDriver.java as below

if (args.length != 5) { System.out.println("Invalid Arguments"); print_usage(); // throw new IllegalArgumentException(); } System.out.println("Start Arguments"); int size = args.length; for (int i=0; i<size; i++) { System.out.println(String.valueOf(i) + " * " + args[i]); }

out put on terminal is

Start Arguments 0 * TripCellDriver 1 * -libjars 2 * ../../lib/esri-geometry-api.jar,../../lib/spatial-sdk-hadoop.jar 3 * 15 4 * 1000 5 * /user/cloudera/trip/data/sample-study-area.json 6 * /user/cloudera/trip/data/sample-vehicle-positions.csv 7 * /user/cloudera/trip/inter End Arguments

It was giving error are arguments are at wrong place so I adjusted the arguments accordingly and got exception as below Error: java.lang.ClassNotFoundException: com.esri.core.geometry.SpatialReference at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at TripCellReducer.setup(TripCellReducer.java:138) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

randallwhitman commented 9 years ago

Hi @trumboosahil - what version of Hadoop are you running? Maybe something has changed in the argument-handling API, such that the generic options are not extracted before the call to TripCellDriver#run.

trumboosahil commented 9 years ago

I'm using hadoop version

Hadoop 2.3.0-cdh5.1.0 Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 8e266e052e423af592871e2dfe09d54c03f6a0e8 Compiled by jenkins on 2014-07-12T13:49Z Compiled with protoc 2.5.0

randallwhitman commented 9 years ago

Interesting, thanks for the info. For reference, this is the argument dump I see on our Hadoop-2.2 cluster:

Arguments to TripCellDriver:
0 * 15
1 * 1000
2 * /user/me/trip/data/sample-study-area.json
3 * /user/me/trip/data/sample-vehicle-positions.csv
4 * /user/me/trip/inter
smambrose commented 9 years ago

For reference, argument dump from Hadoop 2.4 on Sandbox 2.1

Arguments to TripCellDriver:
0 * 15
1 * 1000
2 * /user/me/trip/data/sample-study-area.json
3 * /user/me/trip/data/sample-vehicle-positions.csv
4 * /user/me/trip/inter
randallwhitman commented 9 years ago

Do you happen to have easy access to either 2.2 or 2.4 version of Hadoop, for comparison? While it is possible for 2.3 to differ from both 2.2 and 2.4, it would seem unusual. (Here we have 2.2 and 2.4 but not 2.3 currently.)

randallwhitman commented 9 years ago

Idea for workaround:

int argc = args.length;
if (argc > 5 && "TripCellDriver".equals(args[0])) {
    int offset = argc-5;
    for (int ix=0; ix<5; ++ix) {
        args[ix] = args[ix+argc-5];  // shift by argc-5
    }
    argc = 5;  // after adjustment
}
if (argc != 5) {
  ...
}
randallwhitman commented 9 years ago

Regarding the Geometry library (needed for SpatialReference and for much more) not found, the errors may indicate run-it.sh being invoked from a directory other than the one expected (./run-it.sh vs. cmd/run-it.sh) - for which reason the script was updated to be more robust. Let us know if the issue recurs with the latest version from git master.