Esri / spatial-framework-for-hadoop

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
Apache License 2.0
367 stars 159 forks source link

mvn package - failing test on Spatial JSON Utilities #75

Closed GISDev01 closed 9 years ago

GISDev01 commented 9 years ago

Can someone help me with what I am doing wrong trying to build a local clone of this repo on a Windows 7 box using Maven?

I cloned the repo to my local Windows filesystem and I have Maven set up and working fine on other projects just fine. I just run the command "mvn package" in the same directory with the pom.xml, and here is the output I get in the command prompt.

Any help appreciated.

Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.092 sec <<< FAILURE!

Results :

Failed tests: TestCharacters(com.esri.json.hadoop.TestUnenclosedJsonRecordReader): array lengths differed, expected.length=2 actual.length=1

Tests run: 12, Failures: 1, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spatial Framework for Hadoop ....................... SUCCESS [ 1.096 s] [INFO] Hive Spatial Framework ............................. SUCCESS [ 16.121 s] [INFO] Spatial JSON Utilities ............................. FAILURE [ 2.484 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 19.972 s [INFO] Finished at: 2015-02-03T18:42:25-06:00 [INFO] Final Memory: 13M/304M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-tes t) on project spatial-sdk-json: There are test failures. [ERROR]

randallwhitman commented 9 years ago

Thanks for reporting this. I'll see if it reproduces here.

xiaobao123pppp commented 9 years ago

Try the command "mvn package -DskipTests"

randallwhitman commented 9 years ago

The bad test result does not happen for me with OpenJDK-1.6.0.33 on Ubuntu-12.04 Linux/x86-64.

Tests run: 12, Failures: 0, Errors: 0, Skipped: 0

What version of the JRE/JVM are you using?

TestUnenclosedJsonRecordReader dumps some info to stdout. If you can run TestCharacters only, in an IDE such as Eclipse (or pick the right lines out of mvn test output), what output do you get? For me it outputs as follows:

0 - {"attributes":{"text":"0á"},"geometry":{}}
0 - {"attributes":{"text":"0á"},"geometry":{}}
42 - {"attributes":{"text":"1é"},"geometry":{}}
42 - {"attributes":{"text":"1é"},"geometry":{}}
42 - {"attributes":{"text":"1é"},"geometry":{}}
84 - {"attributes":{"text":"2Í"},"geometry":{}}
126 - {"attributes":{"text":"3ò"},"geometry":{}}
168 - {"attributes":{"text":"4ü"},"geometry":{}}
210 - {"attributes":{"text":"5ñ"},"geometry":{}}
252 - {"attributes":{"text":"6«"},"geometry":{}}
294 - {"attributes":{"text":"7»"},"geometry":{}}
336 - {"attributes":{"text":"8õ"},"geometry":{}}
GISDev01 commented 9 years ago

Ok, I just tried "mvn package -DskipTests" and it looks like it works as expected and skips the tests:

[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spatial Framework for Hadoop ....................... SUCCESS [ 0.590 s] [INFO] Hive Spatial Framework ............................. SUCCESS [ 8.533 s] [INFO] Spatial JSON Utilities ............................. SUCCESS [ 2.770 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 12.105 s [INFO] Finished at: 2015-02-03T19:26:59-06:00 [INFO] Final Memory: 12M/304M [INFO] ------------------------------------------------------------------------

Here's the full setup on this particular Win7 box: Maven 3.2.5 straight from here (http://mirrors.advancedhosters.com/apache/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.zip) You won't believe this but I actually tried all 3 JDK's before posting the issue. I have 6, 7, and 8 downloaded and installed, and I tried the mvn command after setting each of the 3 as my JAVA_HOME in the Win7 Environment variables (using %JAVA_HOME%\bin in my PATH). I also opened and closed Windows command prompt between switching the JDK versions and double checked by check the "path" command before running the mvn package command. All 3 different JDKs resulted in the same exact error. Those 3 JDKs were downloaded straight from Oracle tonight, so the latest updates of all 3 (1.6.0_45, 1.7.0_75, 1.8.0_31).

I think this is the output you're looking for that shows up in the command prompt:

I went ahead and truncated a bunch of these in the middle of this section.

T E S T S

Running com.esri.json.hadoop.TestUnenclosedJsonRecordReader 0 - {"attributes":{"rowid": 1505, "text": "\""},"geometry":{"x":15.0,"y":5.0}} 0 - {"attributes":{"rowid": 1505, "text": "\""},"geometry":{"x":15.0,"y":5.0}} 75 - {"attributes":{"rowid": 535, "text": "\'"},"geometry":{"x":5,"y":35}} 75 - {"attributes":{"rowid": 535, "text": "\'"},"geometry":{"x":5,"y":35}} 146 - {"attributes":{"rowid": 2323, "text": "\"},"geometry":{"x":23,"y":23}} 0 - {"attributes":{"text":"0b\""},"geometry":{}} 0 - {"attributes":{"text":"0b\""},"geometry":{}} 44 - {"attributes":{"text":"1d\""},"geometry":{}} 0 - {"attributes":{"text":"0b\""},"geometry":{}} 44 - {"attributes":{"text":"1d\""},"geometry":{}} 44 - {"attributes":{"text":"1d\""},"geometry":{}} 88 - {"attributes":{"text":"2\"blah\""},"geometry":{}} 137 - {"attributes":{"text":"3\f"},"geometry":{}} .......................................................................................................... 0 - {"attributes":{"text":"0á"},"geometry":{}} 0 - {"attributes":{"text":"0á"},"geometry":{}} 0 - {"attributes":{"index":0},"geometry":{}} 40 - {"attributes":{"index":1},"geometry":{}} 80 - {"attributes":{"index":2},"geometry":{}} 120 - {"attributes":{"index":3},"geometry":{}} 160 - {"attributes":{"index":4},"geometry":{}} 200 - {"attributes":{"index":5},"geometry":{}} 240 - {"attributes":{"index":6},"geometry":{}} 280 - {"attributes":{"index":7},"geometry":{}} 320 - {"attributes":{"index":8,"test":"}{"},"geometry":{}} 372 - {"attributes":{"index":9},"geometry":{}} 0 - {"attributes":{"index":0},"geometry":{}} 40 - {"attributes":{"index":1},"geometry":{}} 0 - {"attributes":{"text":"0b\"},"geometry":{}} 0 - {"attributes":{"text":"0b\"},"geometry":{}} 44 - {"attributes":{"text":"1d\"},"geometry":{}} 0 - {"attributes":{"text":"0b\"},"geometry":{}} 44 - {"attributes":{"text":"1d\"},"geometry":{}} 44 - {"attributes":{"text":"1d\"},"geometry":{}} 88 - {"attributes":{"text":"2\"blah\""},"geometry":{}} 137 - {"attributes":{"text":"3\f"},"geometry":{}} 44 - {"attributes":{"text":"1d\"},"geometry":{}} 88 - {"attributes":{"text":"2\"blah\""},"geometry":{}} 137 - {"attributes":{"text":"3\f"},"geometry":{}} 44 - {"attributes":{"text":"1d\"},"geometry":{}} 88 - {"attributes":{"text":"2\"blah\""},"geometry":{}} 137 - {"attributes":{"text":"3\f"},"geometry":{}} 0 - {"attributes":{"text":"0b{"},"geometry":{}} 0 - {"attributes":{"text":"0b{"},"geometry":{}} 44 - {"attributes":{"text":"1d{"},"geometry":{}} 0 - {"attributes":{"text":"0b{"},"geometry":{}} 44 - {"attributes":{"text":"1d{"},"geometry":{}} 44 - {"attributes":{"text":"1d{"},"geometry":{}} 88 - {"attributes":{"text":"2\"blah\""},"geometry":{}} 137 - {"attributes":{"text":"3\f"},"geometry":{}} 44 - {"attributes":{"text":"1d{"},"geometry":{}} 88 - {"attributes":{"text":"2\"blah\""},"geometry":{}} 137 - {"attributes":{"text":"3\f"},"geometry":{}} 44 - {"attributes":{"text":"1d{"},"geometry":{}} 88 - {"attributes":{"text":"2\"blah\""},"geometry":{}} 137 - {"attributes":{"text":"3\f"},"geometry":{}} 272 - {"attributes":{"text":"6"},"geometry":{}} Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.745 sec <<< FAILURE!

randallwhitman commented 9 years ago

It looks like it logged the tests with a-acute but did not make it to running the tests with e-acute.

There should be no issue of line endings, because there is in fact no newline in the entire file unenclosed-json-chars.json. I wonder if there was automatic transcoding between character sets upon downloading the file when cloning. What is the byte count and md5 checksum of your copy of unenclosed-json-chars.json? Here is what I have:

$ wc unenclosed-json-chars.json
  0   1 387 unenclosed-json-chars.json
$ md5sum unenclosed-json-chars.json
e1f26971965b456c1427c89698d32f44  unenclosed-json-chars.json

Also a dump of the first 80 bytes is:

0000000   {   "   a   t   t   r   i   b   u   t   e   s   "   :   {   "
0000020   t   e   x   t   "   :   "   0 303 241   "   }   ,   "   g   e
0000040   o   m   e   t   r   y   "   :   {   }   }   {   "   a   t   t
0000060   r   i   b   u   t   e   s   "   :   {   "   t   e   x   t   "
0000100   :   "   1 303 251   "   }   ,   "   g   e   o   m   e   t   r
GISDev01 commented 9 years ago

Ok interesting. Here is the first 80 bytes of the unenclosed-json-chars.json from my HexEditor: {"attributes":{"text":"0á"},"geometry":{}}{"attributes":{"text":"1é"},"geometr

MD5 Checksum: e1f26971965b456c1427c89698d32f44 387 bytes

randallwhitman commented 9 years ago

Per MD5 checksum, the file is identical.

(The HexEditor appears to be displaying non-ASCII bytes as ISO-8859 or MS-codepage, rather than fragments of a UTF-8 multi-byte character, whereas od -c displayed such as octal bytes, e.g. 0303 = 195 = 0xc3.)

GISDev01 commented 9 years ago

So are we thinking this is something specific to my environment or my Maven or my JDK or maybe am I really the very first one to try and compile the repo on a Windows box and notice that 1 of the tests failed? Since I still get the one .jar file output, I guess it's still all good right?

If anyone has any more hunches on why it might be failing, then I will be glad to look into it and try to make bugfix and subsequent pull request.

randallwhitman commented 9 years ago

It is entirely possible that you are the first person to build it on MS-Windows since that test was merged onto master just under a month ago.

As far as use of the .jar file you successfully built: if you are running the jobs on a cluster with Linux OS, or if all your files are in formats other than JSON, or all the text in your JSON files is limited to ASCII characters, then hopefully the issue would be moot. If you are processing non-ASCII text strings in Unenclosed-JSON format files, on MS-Windows, then this will take extra caution with the usual best practice of starting with a small subset of the data on which results can be hand-verified.

If you would like to took into the issue and potentially provide a patch, that would be great. Here are some ideas that could be looked into:

GISDev01 commented 9 years ago

I am going to look into those 3 ideas soon. I will go ahead and close this one, since it's not a show-stopper.

smambrose commented 9 years ago

I was able to reproduce:

maven version: 3.2.2
jdk: 1.7.0_71
OS: Windows 7 
$ wc unenclosed-json-chars.json
  0   1 387 unenclosed-json-chars.json
$ md5sum unenclosed-json-chars.json
e1f26971965b456c1427c89698d32f44 *unenclosed-json-chars.json
$ od -c unenclosed-json-chars.json
0000000   {   "   a   t   t   r   i   b   u   t   e   s   "   :   {   "
0000020   t   e   x   t   "   :   "   0 303 241   "   }   ,   "   g   e
0000040   o   m   e   t   r   y   "   :   {   }   }   {   "   a   t   t
0000060   r   i   b   u   t   e   s   "   :   {   "   t   e   x   t   "
0000100   :   "   1 303 251   "   }   ,   "   g   e   o   m   e   t   r
randallwhitman commented 9 years ago

The alternate idea of a byte-based rather than character-based record reader, has been separated out to #79.