deanhiller / databus

time series data in cassandra with visualization(NREL's opensource databus project)
Other
53 stars 21 forks source link

postdataV1 api returns http error 400 on a valid data file #31

Open kbrems opened 10 years ago

kbrems commented 10 years ago

I have an Android app that sends data files every minute to databus. Every once in a while I get back an http error 400 (bad request). However when I try to send this file again either through the same code or via curl, it uploads fine.

When I looked at the databus log file for the time period of the error I see

WARN: There was a problem parsing the JSON request. org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@898e150; line: 1, column: 1]) at [Source: java.io.StringReader@898e150; line: 1, column: 24607] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) ~[jackson-core-asl-1.9.8.jar:1.9.8]

But the file is valid json (I ran it through jsonlint and it also sends correctly the 2nd time). Given that databus is reporting the error on the first character, and this is a 26K file, my guess is that the problem is really some kind of temporary network connection issue and it is not actually getting the complete file, or not getting any file in the http POST, but the databus code is not catching that at a higher level and is sending bad or empty data to the parser, which generates the 400 error.

The problem is that then I can't detect the difference between a legitimately bad file and a temporary connection error, in which case I want to retry sending the file.

deanhiller commented 10 years ago
  1. can we get the more full stack trace(the one above only has the top line) as I do want to look at the specific code to see if I can see anything.
  2. A disconnect in the middle would cause the message not to be parsed at all in json(csv upload is different). Since it actually got to parsing, we know there was no network issue. The way it works is that for json posts.....all of the information is sent and in the http header exists the size of the bytes of the payload(basic http stuff. actual http header is called Content-Length)....once that is read, it reads the payload and once the payload is successfully read, it then calls our controller method with the entire json payload. If you have a network issue, our controller will not be called at all. Since our controller is called, we know there is no network error and then the parsing fails.

What we can do however is the following!!!! Once you give me the stack trace I can verify I modified the correct code as I now log the bad json in that exception message you are reading. NOTE: This json should already be logged in a previous log statement as well from this specific code in databus...

log.info("posting json="+json);

thanks, Dean

kbrems commented 10 years ago

Here is the full stack trace:

2013-11-07 01:23:46,863 [7965181344519-(robot-android_dev1)] [play-thread-11] c.ApiPostDataPoints logNoMdc INFO: Processing request number=800(double posted=0) numPointsPosted=47222 in total time=457143.0 seconds. ptPerSec=0.10329809271934602 postsPerSec=0.001749999453125171 failureDouble=800 2013-11-07 18:16:27,046 [7965181344523-(robot-android_dev1)] [play-thread-4] c.Parsing parseJson WARN: There was a problem parsing the JSON request. org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@faebc99; line: 1, column: 1]) at [Source: java.io.StringReader@faebc99; line: 1, column: 17565] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.JsonParserBase._handleEOF(JsonParserBase.java:473) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.ReaderBasedParser._skipWSOrEnd(ReaderBasedParser.java:1496) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:368) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:172) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.MapDeserializer._readAndBind(MapDeserializer.java:319) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:249) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:33) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at controllers.Parsing.parseJson(Parsing.java:40) ~[webapp:na] at controllers.ApiPostDataPoints.postData(ApiPostDataPoints.java:109) [webapp:na] at sun.reflect.GeneratedMethodAccessor236.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_45] at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_45] at play.mvc.ActionInvoker.invokeWithContinuation(ActionInvoker.java:587) [play-1.2.x-1.1.0-2208.jar:na] at play.mvc.ActionInvoker.invoke(ActionInvoker.java:538) [play-1.2.x-1.1.0-2208.jar:na] at play.mvc.ActionInvoker.invokeControllerMethod(ActionInvoker.java:514) [play-1.2.x-1.1.0-2208.jar:na] at play.mvc.ActionInvoker.invokeControllerMethod(ActionInvoker.java:509) [play-1.2.x-1.1.0-2208.jar:na] at play.mvc.ActionInvoker.invoke(ActionInvoker.java:187) [play-1.2.x-1.1.0-2208.jar:na] at play.server.PlayHandler$NettyInvocation.execute(PlayHandler.java:334) [play-1.2.x-1.1.0-2208.jar:na] at play.Invoker$Invocation.run(Invoker.java:284) [play-1.2.x-1.1.0-2208.jar:na] at play.server.PlayHandler$NettyInvocation.run(PlayHandler.java:307) [play-1.2.x-1.1.0-2208.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) [na:1.6.0_45] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [na:1.6.0_45] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) [na:1.6.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) [na:1.6.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [na:1.6.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [na:1.6.0_45] at java.lang.Thread.run(Thread.java:662) [na:1.6.0_45] 2013-11-07 21:45:26,269 [7965181344620-(robot-android_dev1)] [play-thread-13] c.ApiPostDataPoints logNoMdc INFO: Processing request number=900(double posted=0) numPointsPosted=52905 in total time=530442.0 seconds. ptPerSec=0.09973757734116077 postsPerSec=0.0016966982252536564 failureDouble=900

From: deanhiller [mailto:notifications@github.com] Sent: Wednesday, November 20, 2013 5:56 AM To: deanhiller/databus Cc: Brems, Karen (HP Labs) Subject: Re: [databus] postdataV1 api returns http error 400 on a valid data file (#31)

  1. can we get the more full stack trace(the one above only has the top line) as I do want to look at the specific code to see if I can see anything.
  2. A disconnect in the middle would cause the message not to be parsed at all in json(csv upload is different). Since it actually got to parsing, we know there was no network issue. The way it works is that for json posts.....all of the information is sent and in the http header exists the size of the bytes of the payload(basic http stuff. actual http header is called Content-Length)....once that is read, it reads the payload and once the payload is successfully read, it then calls our controller method with the entire json payload. If you have a network issue, our controller will not be called at all. Since our controller is called, we know there is no network error and then the parsing fails.

What we can do however is the following!!!! Once you give me the stack trace I can verify I modified the correct code as I now log the bad json in that exception message you are reading. NOTE: This json should already be logged in a previous log statement as well from this specific code in databus...

log.info("posting json="+json);

thanks, Dean

— Reply to this email directly or view it on GitHubhttps://github.com/deanhiller/databus/issues/31#issuecomment-28890489.

deanhiller commented 10 years ago

ah, so that is an older version that does not have that log I mentioned. upgrade and it will log the bad json that it did receive. Let me know what you receive on the new version.

thanks, Dean

deanhiller commented 10 years ago

when I say upgrade, you only need to upgrade the webserver not cassandra by the way.

kbrems commented 10 years ago

OK, we have updated to the latest version. The bug is hard to reproduce though – it happened twice in something like 1000 uploads.

Karen

From: deanhiller [mailto:notifications@github.com] Sent: Wednesday, November 20, 2013 10:53 AM To: deanhiller/databus Cc: Brems, Karen (HP Labs) Subject: Re: [databus] postdataV1 api returns http error 400 on a valid data file (#31)

ah, so that is an older version that does not have that log I mentioned. upgrade and it will log the bad json that it did receive. Let me know what you receive on the new version.

thanks, Dean

— Reply to this email directly or view it on GitHubhttps://github.com/deanhiller/databus/issues/31#issuecomment-28917308.

kbrems commented 10 years ago

OK, I finally reproduced the error. It seems the data file is getting truncated somewhere on upload. The json in the log is not the complete file. I am attaching the log as well as the data file I am trying to upload. The log stops at the event with “time” field of 1385238868357 but there are more events in the file and the file is closed correctly.

If I upload the bad file again, it works fine.

Karen

From: deanhiller [mailto:notifications@github.com] Sent: Wednesday, November 20, 2013 10:53 AM To: deanhiller/databus Cc: Brems, Karen (HP Labs) Subject: Re: [databus] postdataV1 api returns http error 400 on a valid data file (#31)

when I say upgrade, you only need to upgrade the webserver not cassandra by the way.

— Reply to this email directly or view it on GitHubhttps://github.com/deanhiller/databus/issues/31#issuecomment-28917419.

2013-11-06 23:26:48,723 [7965181344402-(robot-android_dev1)] [play-thread-16] c.Parsing parseJson WARN: There was a problem parsing the JSON request. org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for ARRAY (from [Source: java.io.StringReader@898e150; line: 1, column: 1]) at [Source: java.io.StringReader@898e150; line: 1, column: 24607] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.JsonParserBase._handleEOF(JsonParserBase.java:473) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.ReaderBasedParser._skipWSOrEnd(ReaderBasedParser.java:1496) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:368) ~[jackson-core-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:172) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.MapDeserializer._readAndBind(MapDeserializer.java:319) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:249) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:33) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863) ~[jackson-mapper-asl-1.9.8.jar:1.9.8] at controllers.Parsing.parseJson(Parsing.java:40) ~[webapp:na] at controllers.ApiPostDataPoints.postData(ApiPostDataPoints.java:109) [webapp:na] at sun.reflect.GeneratedMethodAccessor236.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_45] at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_45] at play.mvc.ActionInvoker.invokeWithContinuation(ActionInvoker.java:587) [play-1.2.x-1.1.0-2208.jar:na] at play.mvc.ActionInvoker.invoke(ActionInvoker.java:538) [play-1.2.x-1.1.0-2208.jar:na] at play.mvc.ActionInvoker.invokeControllerMethod(ActionInvoker.java:514) [play-1.2.x-1.1.0-2208.jar:na] at play.mvc.ActionInvoker.invokeControllerMethod(ActionInvoker.java:509) [play-1.2.x-1.1.0-2208.jar:na] at play.mvc.ActionInvoker.invoke(ActionInvoker.java:187) [play-1.2.x-1.1.0-2208.jar:na] at play.server.PlayHandler$NettyInvocation.execute(PlayHandler.java:334) [play-1.2.x-1.1.0-2208.jar:na] at play.Invoker$Invocation.run(Invoker.java:284) [play-1.2.x-1.1.0-2208.jar:na] at play.server.PlayHandler$NettyInvocation.run(PlayHandler.java:307) [play-1.2.x-1.1.0-2208.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) [na:1.6.0_45] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [na:1.6.0_45] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) [na:1.6.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) [na:1.6.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [na:1.6.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [na:1.6.0_45] at java.lang.Thread.run(Thread.java:662) [na:1.6.0_45] 2013-11-06 23:43:47,590 [7965181344419-(robot-android_dev1)] [play-thread-11] c.ApiPostDataPoints logNoMdc INFO: Processing request number=700(double posted=0) numPointsPosted=41291 in total time=451144.0 seconds. ptPerSec=0.0915251006330573 postsPerSec=0.0015516110155515755 failureDouble=700 2013-11-07 01:23:46,863 [7965181344519-(robot-android_dev1)] [play-thread-11] c.ApiPostDataPoints logNoMdc INFO: Processing request number=800(double posted=0) numPointsPosted=47222 in total time=457143.0 seconds. ptPerSec=0.10329809271934602 postsPerSec=0.001749999453125171 failureDouble=800

deanhiller commented 10 years ago

At this point, you will need to use tshark on the server to capture all the packets, and then you can upload those to a wireshark GUI and inspect those packets. Basically, if you find the json packet from our logs and know which one exactly it is, you then need to find that packet in the wireshark GUI(it decodes all your http packets for you) and see if that packet is not the full thing. This will help you determine which side of the fence the issue occurs on.

ie. previously, I had seen another bad client and with tshark was able to prove that their client is not sending the full payload. Perhaps you will prove it is sending the full payload. At the very least, we will know which side of the fence the issue occurs on. If it is the server side, please send us the packet trace and the full message in the logs as well. (you can do that by email if you don't want those results on the web to databus@nrel.gov)

thanks, Dean

deanhiller commented 10 years ago

or maybe use tcpdump which I think can load into wireshark as well(and in wireshark you can tell it to decode all the packets into http by right clicking on a packet and decoding)....

http://www.wireshark.org/docs/wsug_html_chunked/AppToolstcpdump.html

or there is a similar way with tshark as well which I have done in the past and you have to get the options right and finding them on the web is sort of a pain sometimes.

deanhiller commented 10 years ago

OH, OH, send me one http packet trace(text based) from this as well(a good one or bad one from your runs). In some situations, people have been known to upload data with a chunked marker that happens to match data. This can be very very bad. See, in the http headers, one must tell us the end marker and we look for that marker. If that marker happens to be in the data, we think the payload is finished(because the client told us the marker occurred and we found an instance of it).

This marker should be fairly long as to not match any data in your payload.

thanks, Dean

deanhiller commented 10 years ago

oops, I forgot one thing.......we don't know if you are doing chunked uploads or not when it comes to json posts. IF you are doing chunked uploads(happens for larger payloads), this is an example of a boundary header that chrome had sent

multipart/form-data; boundary=----WebKitFormBoundaryuni3DI3VMQ0kPDKT

note that for json upload, we rely on playframework which relies on a very stable apache library for parsing chunked uploads so I am pretty sure the issue is not server side(but I have been wrong before which is why I like to dig and be 100% sure first).

If the issue is during a non-chunked upload(no multipart/form header), then the playframework is doing most of the work(and I can debug the playframework).

Anyways, let's get some packet traces going and if you didn't understand everything I am saying, I can be available by phone as well if you need. Matthew has my cell. (I don't like posting my phone on web pages ;) ).

Dean

kbrems commented 10 years ago

Dean,

Yes, I am doing chunked uploads. I am just using

org.apache.http.entity.FileEntity.setChunked(true);

I am not doing anything myself to set the chunking marker. My files I am uploading are all around 26K.

I am attaching a tcpdump capture of a successful upload. Capturing a failure is going to be hard as it only happens once every 500-1000 uploads….

Karen

From: deanhiller [mailto:notifications@github.com] Sent: Tuesday, November 26, 2013 1:27 PM To: deanhiller/databus Cc: Brems, Karen (HP Labs) Subject: Re: [databus] postdataV1 api returns http error 400 on a valid data file (#31)

oops, I forgot one thing.......we don't know if you are doing chunked uploads or not when it comes to json posts. IF you are doing chunked uploads(happens for larger payloads), this is an example of a boundary header that chrome had sent

multipart/form-data; boundary=----WebKitFormBoundaryuni3DI3VMQ0kPDKT

note that for json upload, we rely on playframework which relies on a very stable apache library for parsing chunked uploads so I am pretty sure the issue is not server side(but I have been wrong before which is why I like to dig and be 100% sure first).

If the issue is during a non-chunked upload(no multipart/form header), then the playframework is doing most of the work(and I can debug the playframework).

Anyways, let's get some packet traces going and if you didn't understand everything I am saying, I can be available by phone as well if you need. Matthew has my cell. (I don't like posting my phone on web pages ;) ).

Dean

— Reply to this email directly or view it on GitHubhttps://github.com/deanhiller/databus/issues/31#issuecomment-29335600.

deanhiller commented 10 years ago

I think github strips the attachments. Can you send the file to databus@nrel.gov and dean@buffalosw.com. I want to see the boundaries that are being used.

thanks, Dean

deanhiller commented 10 years ago

ok, so you are not doing content-type multipart and instead just doing application/json with chunked which means the first part of each chunk must be the number of bytes sent in that chunk. There will be no boundary delimeter conflicts at all since the format is just {size}\r\n{chunkdata}\r\n{size}\r\n{chunkdata}

if any of those sizes are wrong, it could stop early. We need the exact packet from a failure though to see if the size is wrong.

NOTE: I found this issue on android when using netty(are you using a library that uses netty or maybe using netty directly?)....(it appears the request he was sending was being truncated)

http://stackoverflow.com/questions/8826529/serving-a-file-with-netty-response-is-truncated-by-one-byte

We are using netty as well and perhaps the issue is in netty server side or client side. A packet trace capture is the only thing that can really shed more light on this.

thanks, Dean