Closed sivagiri closed 5 years ago
You get the streams from manager.getMultiPartOutputStreams();
, you can't pass your own stream.
How could i pass My file or Stream ?? In case of aws I used below one TransferManager tm = getTransferManager(); Upload upload = tm.upload(request);(in this request I sent a File in PutObject) UploadResult ur = upload.waitForUploadResult();
In your Case How I use StreamTransferManager ?? to upload a big file or stream ? to aws
List<OutputStream> streams = manager.getMultiPartOutputStreams();
streams.get(0).write("stuff".getBytes())
You cannot pass your own stream, you must write to the stream from the manager.
If your data is already in a file, there is no point in using this library. This is for avoiding files and doing everything in memory.
Hi,
I have 5gb stream not able use your library, where can I send stream ??
On Fri, 17 May 2019 at 3:10 PM, Alex Hall notifications@github.com wrote:
List
streams = manager.getMultiPartOutputStreams(); streams.get(0).write("stuff".getBytes()) You cannot pass your own stream, you must write to the stream from the manager.
If your data is already in a file, there is no point in using this library. This is for avoiding files and doing everything in memory.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alexmojaki/s3-stream-upload/issues/13?email_source=notifications&email_token=ACOLYRMDJOCC2JRUQMZUODTPVZ4RZA5CNFSM4HNTPOPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUJBTY#issuecomment-493392079, or mute the thread https://github.com/notifications/unsubscribe-auth/ACOLYRP6XASHCWYFYZSIO33PVZ4RZANCNFSM4HNTPOPA .
Provide a lot more code and context. The sample you gave is not very informative. You mentioned a file, but apparently you don't want to use one? Why are you unable to use the streams from manager.getMultiPartOutputStreams()
? How is the stream you have created?
I have working on it .. currently only one part is uploaded I will share the code ASAP,
On Fri, 17 May 2019 at 4:12 PM, Alex Hall notifications@github.com wrote:
Provide a lot more code and context. The sample you gave is not very informative. You mentioned a file, but apparently you don't want to use one? Why are you unable to use the streams from manager.getMultiPartOutputStreams()? How is the stream you have created?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alexmojaki/s3-stream-upload/issues/13?email_source=notifications&email_token=ACOLYRJYFRTDDJBL7MSTKIDPV2DY3A5CNFSM4HNTPOPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUNKSY#issuecomment-493409611, or mute the thread https://github.com/notifications/unsubscribe-auth/ACOLYRORLZFBMQNEZWIMVWLPV2DY3ANCNFSM4HNTPOPA .
I am facing same problem. Requirement - Upload multiple large files (upto 10GB) onto AWS S3 without loading into memory or saving onto disc.
Current Setup spring-boot based API which accepts file as multipart. application uses Apache commons-fileupload to extract request content as Stream & form fields. Now, how I can write this stream to MultiPartOutputStream.write()? Converting this into []byte, will load whole stream into memory.
@RequestMapping(value = "/api/upload", method = RequestMethod.POST)
public String handleUploadWithoutSize(HttpServletRequest request) {
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iterStream = upload.getItemIterator(request);
while (iterStream.hasNext()) {
FileItemStream item = iterStream.next();
if (!item.isFormField()) {
InputStream stream = item.openStream();
StreamTransferManagerService.write(Streams.asString(stream).getBytes());
} else {
//process form fields
}
}
StreamTransferManagerService
//StreamTransferManager configuration for s3 and others
final List<MultiPartOutputStream> streams = manager.getMultiPartOutputStreams();
//List<StringBuilder> builders = new ArrayList<StringBuilder>(numStreams);
ExecutorService pool = Executors.newFixedThreadPool(numStreams);
for (int i = 0; i < numStreams; i++) {
final int streamIndex = i;
final StringBuilder builder = new StringBuilder();
// builders.add(builder);
Runnable task = new Runnable() {
@Override
public void run() {
MultiPartOutputStream outputStream = streams.get(streamIndex);
for (int lineNum = 0; lineNum < 1000000; lineNum++) {
//String line = String.format("Stream %d, line %d\n", streamIndex, lineNum);
try {
outputStream.write(streamBytes);
} catch (IOException e) {
e.printStackTrace();
}
// builder.append(line);
}
outputStream.close();
}
};
pool.submit(task);
}
pool.shutdown();
pool.awaitTermination(5, TimeUnit.SECONDS);
manager.complete();
}
When I tried with 1GB file memory utilization bumped to ~1GB. Process did not complete. I killed server after waiting for 10 minutes
First result of googling "java copy from inputstream to outputstream": https://stackoverflow.com/a/39440936/2482744
In Java 9:
input.transferTo(output);
We are using JDK8. However in this case it just accepts byte[].
I really want to use this library but need a way to pass input stream to output stream without loading memory.
It would be great if you can share sample code on the same.
Here's the source of transferTo, you can use the idea: https://github.com/netroby/jdk9-dev/blob/master/jdk/src/java.base/share/classes/java/io/InputStream.java#L518
Basically you read a few bytes from the input stream and write them to the output stream, and repeat.
thanks for quick reply!!!
It looks like working now. Here is my observations-
Configuration - numStreams = 1 numUploadThreads=10 queueCapacity=2 partSize=20
Any way to optimize this?
I suspect that the 500MB memory usage is just the JVM preallocating that much memory for the heap, see #2.
If your data is already in a file on disk then there is no point in using this library, use the AWS SDK. This library is for avoiding the file system and keeping everything in memory, which is only sometimes useful.
If the thread that writes to the stream is getting blocked, that means it's producing data faster than it's being uploaded. Try increasing the number of upload threads.
The upload threads are bound to get blocked at the beginning while they wait for initial data. With your current configuration uploading a 200MB (10 * 20MB) file, the only way all the threads will be used is if they all upload exactly one part, which is likely not to happen. If you upload something bigger and the writing thread writes data fast enough, you're more likely to see all the threads get used.
Unless you're uploading something bigger than 50GB, I think you can leave the part size at the default of 5MB. Then the threads can more quickly pick up parts and start uploading them so they'll block less. It'll also reduce memory usage if that's actually a problem.
I'm trying to use this nice library. From time to time the file I'm uploading to s3 is empty. Is there a planning to support such case?
I've opened #21 for empty files. There's no reason to have more discussion in this issue.
@nditur empty files are now supported in version 2.1.0. If you were overriding any of the customise*Request
methods, you may want override customisePutEmptyObjectRequest
.
Hi,
I tried your example but not able to pass any file or stream !! Could you help on this ?? or if you had any good example can share it ??