adamkewley / jobson

A platform for transforming command-line applications into a job service.
Apache License 2.0
256 stars 20 forks source link

Users submitting large requests gives error #61

Closed Russel88 closed 2 years ago

Russel88 commented 4 years ago

Hey Adam,

Thanks for a great framework! I hope you can help with a problem I get sometimes.

I get an error when I submit a large request (~100mb string array). Jobson gives the following error:

Error loading submitting job

There was an error loading submitting job from the Jobson API. The APIs error message was: 
There was an error processing your request. It has been logged (ID 2f80fd3330f5c955)..

There is no error in my nginx log, only a POST request. I can see that Jobson creates the Job, but it just hangs at the "Job persisted" stage.

Do you know what the problem could be, and how to solve it?

Also, where are the logs saved by default? I can't seem to find the jobson log.

Cheers, Jakob

adamkewley commented 4 years ago

Hi @Russel88 ,

Where the logs go depends on how you're running Jobson (there's a logs configuration argument inherited from Dropwizard, which Jobson uses). By default, the logs are written to the standard output of jobson, so it'll be whatever either appears in your console window (if you're running it interactively, for short sessions) or it'll be written to wherever you sink the standard output.

For docker-based deployments, stdout is captured by the docker daemon, so you'll need to run:

docker logs <container_name>

For systemd-based deployments (e.g. you wrote a jobson.service file), it'll be slurped by journalctl:

journalctl -u jobson

Or you might've piped the output somewhere when you booted the daemon?:

jobson serve config.yml 2>&1 1> logfile

I'm sorry that finding the error message is a bit difficult. Jobson, in general, is missing some prod-grade features like having a basic log viewer built into the UI / commandline, etc. (I only had a few months full-time to develop it, and I now can only work on it occasionally :().

100 MB doesn't seem so large, though, so I'm surprised it's crashing on that - I'll try with some bigger inputs and report back.

adamkewley commented 4 years ago

I can confirm that particularly large inputs can pop the java heap.

If I pass a 1 GiB input into the UI, javascript will crash when trying to parse through the file (kind of understandable, because the javascript is trying to split a massive input into separate string objects - there are ways around this, but it'll take time to develop).

If I pass 100 MiB input into the UI as a string[], the javascript will successfully read + parse the file into a 53 M element array (my file is just has y followed by a newline for each "entry), but submitting this to the backend makes it throw an exception:

20/07/05 08:16:56 ERROR io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 3adf6cbb948e3bc7
java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
    at java.base/java.util.ArrayList.grow(ArrayList.java:237)
    at java.base/java.util.ArrayList.grow(ArrayList.java:242)
    at java.base/java.util.ArrayList.add(ArrayList.java:485)
    at java.base/java.util.ArrayList.add(ArrayList.java:498)
    at com.fasterxml.jackson.databind.deser.std.StringCollectionDeserializer.deserialize(StringCollectionDeserializer.java:194)
    at com.fasterxml.jackson.databind.deser.std.StringCollectionDeserializer.deserialize(StringCollectionDeserializer.java:173)
    at com.fasterxml.jackson.databind.deser.std.StringCollectionDeserializer.deserialize(StringCollectionDeserializer.java:21)
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromArray(BeanDeserializerBase.java:1436)
    at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:185)
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161)
    at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3985)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2277)
    at com.github.jobson.Helpers.readJSON(Helpers.java:289)

There might be ways around this, such as slightly redefining how the backend should parse the JSON, but the backend is currently pretty stupid about how it handles JSON payloads (doesn't stream them, just reads them into memory, etc.), because this stupidity makes the backend code simpler (e.g. JobRequest jr = readJSON(payload);).

A "correct" approach would permit form-multipart requests from clients and would sink the data from the TCP connection as a binary blob to the persistence layer, rather than reading the whole request into memory once (as the body binary), then parsing it into a JSON object (as a java object), then mapping it into a Jobson object, which allocates millions of string objects, which is probably what's popping the heap.

One bandaid you can use is to increase the Java heap size (I need to try a solution - will post it here). By default, Java's maximum heap size is ~1/4 of your system's memory. On my machine, that's ~4 GiB, which is a lot of memory for a server daemon. The reason it needs to be large is because 53 M string objects (in my case) is likely to use ~32 bytes per object * 53 M = ~1.8 GB of memory. Then there's the request body binary hanging around, and all the temporary objects, so there's probably enough stuff hanging around on the heap to pop that exception.

A better solution is to use the new (undocumented) file and file[] inputs that I just added here. They've been merged into master and released as undocumented features (I need to clean them up). If you use Jobson v1.0.10+ they should be available. To test that feature with your input sizes (~100 MiB), I wrote this config:

name: Generated job spec
description: >
  A job spec generated by jobson. The admin should've changed this.
expectedInputs:
- id: finput
  name: Input file
  description: Some strarray input
  type: file
execution:
  application: hexdump
  arguments:
  - "-n"
  - "1024"
  - "-C"
  - "${inputs.finput}"

And the UI seems to be completely fine with a 100 MiB file. Probably because the UI is treating the file as a blob that it just needs to base64-encode into the JSON request, rather than trying to parse the file into millions of string elements).

However, the UI seems to behave a little funky with a 1 GiB file, which is a bug that I need to investigate (it should at least show some kind of error). Very large inputs like that are likely to be annoying to handle with Jobson's current architecture, because it currently requires that javascript reads the file into memory and converts it into a base64 string, so that it can be added to a JSON job request. Again, a "good" architecture would leverage a browser's capability to stream a File input from an <input> to the server as a multipart form request. However, implementing this will take a lot of time in Jobson, because I'd need to implement it in a way that's backwards-compatible with the simpler JSON-style also (because some users use Jobson's JSON API to integrate their own frontends, etc.)

adamkewley commented 4 years ago

I forgot to publish 1.0.10: will get on this asap so that you also have access to the file[] type.

adamkewley commented 4 years ago

1.0.10 is now published, which has the file and file[] support, if that'll work for you

Russel88 commented 4 years ago

Thanks a lot for the detailed answer!

I will try to use the file type, it sounds exactly as what I need. I convert the large string array to a file anyway, as the input for my cli script is a file.

Even though Jobson is lacking a few features, I think it's working very well. I use it for a public bioinformatic web server and our users are happy with the interface.

Cheers and thanks for your help, Jakob

adamkewley commented 4 years ago

No problem, I'm just glad to hear that it's useful, even if only as a stepping stone towards something more specialized to your needs (e.g. a custom webserver). The main goal of Jobson is to try and make the initial step of getting a pipeline on the internet as simple as possible, so I hope it at least achieved that much.

Again, I'm sorry it's lacking prod-grade features that you might've expected. I only get a very limited amount of time to maintain this :(. The limited featureset was good enough for Gaia's internal requirements. The OSSing and feature additions are something I've mostly done in my spare time, which usually equates to a few hours here and there. Loads of that time has been spent on packaging, docs, etc. rather than making the core server (e.g. user logins etc. are terrible still).

Several researchers (usually bioinformatics related, for some reason :S) have emailed me about Jobson, and there's a couple of companies using it back-of-house. I might be able to make a case with a software engineering grant agency that Jobson is a project that could shave off the amount of time it takes researchers to get their work online. Something like a grant would would give me a bit of time to actually kick these things into shape - it's just finding time to organize these things.

adamkewley commented 4 years ago

@Russel88 did you manage to get what you need out of the new API additions or did you guys drop this (use something else, find another way around it)? It took a bit of time to half-add this (undocumented) feature so it would be handy to know if it's used before I flesh it out (e.g. with documentation, examples, etc.)

Russel88 commented 4 years ago

@adamkewley I'm still working on it, but as I'm on paternity leave things go slow. I think it's exactly what we need, and I will report back when I have tested it.

Russel88 commented 4 years ago

Hi @adamkewley. It seems to work fine with a 105MB file. Although it fails with a 111MB file. My Java maxHeapSpace is 2GB, so I'm not sure why the cut is around 110MB...

Here's the error:

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at java.lang.StringBuilder.toString(StringBuilder.java:407)
    at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:426)
    at com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:83)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3419)
    at com.github.jobson.Helpers.toJSON(Helpers.java:256)
    at com.github.jobson.Helpers.writeJSON(Helpers.java:293)
    at com.github.jobson.dao.jobs.FilesystemJobsDAO.createNewJobDirectory(FilesystemJobsDAO.java:241)
    at com.github.jobson.dao.jobs.FilesystemJobsDAO.persist(FilesystemJobsDAO.java:206)
    at com.github.jobson.jobs.JobManager.submit(JobManager.java:124)
    at com.github.jobson.jobs.JobManager.submit(JobManager.java:120)
    at com.github.jobson.resources.v1.JobResource$1.whenLeft(JobResource.java:306)
    at com.github.jobson.resources.v1.JobResource$1.whenLeft(JobResource.java:303)
    at com.github.jobson.utils.Either.visit(Either.java:48)
    at com.github.jobson.resources.v1.JobResource.submitJob(JobResource.java:302)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$281/26059997.invoke(Unknown Source)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
adamkewley commented 4 years ago

@Russel88 oh god, I'm so sorry for having you guinea pig on this.

The memory usage of the File abstraction in Jobson is pretty huge, because it requires reading the entire file into memory from the API (>110 MB of base64-encoded JSON), deserializing it (110 MB of binary), followed by re-serializing it into the jobs/ directory (however much JSON + intermediate objects). It looks like it just uses a crazy amount of memory doing all of that (admittedly, this could be done in a streaming manner directly from the TCP connection if it wasn't required that the job request was a single atomic JSON payload, but I digress).

It looks like the feature is going to need to be kicked around a little internally - at least to make sure that it doesn't pointlessly convert to+from binary server-side a few times (which is what's blowing up the heap). I'm sorry if this is frustrating for you - I don't get much development time so I tend to implement these features in a way that is forward compatible (for the user-facing API) but sub-optimal server-side

adamkewley commented 2 years ago

Closed because no followup and there hasn't been sufficient development time to re-code the file upload process to handle extremely large inputs :<