Open Firedrops opened 5 years ago
which file is this, maybe we can split the fastq upfront?
On Mon, 18 Feb 2019 at 11:48, Firedrops notifications@github.com wrote:
I have tried increasing the MACHINE_TYPE to n1-standard-8, which is 8 vCPUs and 30 GB RAM, should be more than enough for any of the reference files.
Large files (~>100 kb?) still get stuck with these error logs:
2019-02-18 (11:33:28) Processing stuck in step Alignment for at least 05m00s without outputting or completing in state pro...
Processing stuck in step Alignment for at least 05m00s without outputting or completing in state process at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140) at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105) at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58) at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49) at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)
2019-02-18 (11:38:28) Processing stuck in step Alignment for at least 10m00s without outputting or completing in state pro...
Processing stuck in step Alignment for at least 10m00s without outputting or completing in state process at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140) at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105) at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58) at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49) at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)
2019-02-18 (11:38:38) org.apache.http.client.ClientProtocolException: Unexpected response status: 502
org.apache.http.client.ClientProtocolException: Unexpected response status: 502 com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:39) com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:17) org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:223) org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165) org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140) com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105) com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58) com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/72, or mute the thread https://github.com/notifications/unsubscribe-auth/AD01ZIsAmzEfSnfYLZUadq7pw2OmZyMQks5vOgZYgaJpZM4a_8Xt .
-- Group leader, Institute for Molecular Bioscience, University of Queensland Senior Lecturer, Imperial College http://academickarma.org/0000-0002-4300-455X http://orcid.org/0000-0002-4300-455X
The last stack trace indicates http 502. You may have flooded the alignment cluster. How many reads are you submitting per batch?
On Mon, Feb 18, 2019, 09:48 Firedrops notifications@github.com wrote:
I have tried increasing the MACHINE_TYPE to n1-standard-8, which is 8 vCPUs and 30 GB RAM, should be more than enough for any of the reference files.
Large files (~>100 kb?) still get stuck with these error logs:
2019-02-18 (11:33:28) Processing stuck in step Alignment for at least 05m00s without outputting or completing in state pro...
Processing stuck in step Alignment for at least 05m00s without outputting or completing in state process at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140) at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105) at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58) at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49) at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)
2019-02-18 (11:38:28) Processing stuck in step Alignment for at least 10m00s without outputting or completing in state pro...
Processing stuck in step Alignment for at least 10m00s without outputting or completing in state process at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140) at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105) at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58) at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49) at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)
2019-02-18 (11:38:38) org.apache.http.client.ClientProtocolException: Unexpected response status: 502
org.apache.http.client.ClientProtocolException: Unexpected response status: 502 com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:39) com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:17) org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:223) org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165) org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140) com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105) com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58) com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/allenday/nanostream-dataflow/issues/72, or mute the thread https://github.com/notifications/unsubscribe-auth/AAanP0iwBJtc7x5rlRz5B9M4VUHzKvhXks5vOgZYgaJpZM4a_8Xt .
The last stack trace indicates http 502. You may have flooded the alignment cluster. How many reads are you submitting per batch?
Just 1.
On further testing, it seems the file size is not the main issue. 20170731_GP01_MNP_nohuman.fastq
, 866kb, always causes that error.
A cassava file, test_Cassava_KE.barcode1_KE.barcode1.fasta
, 1,111kb, did not cause the error.
Another cassava, test_Cassava_UG.Barcode1_UG.Barcode1.fastq
43,940kb, caused the 5 minutes error.
I'm still further testing, it's a bit slow since it takes the 5 minutes to see this error pop up. For now it looks like big .fasta files are OK, but .fastq files are not.
UPDATE:
Testing with another large fastq file also causes the 502
errors, as well as multiples of this:
2019-02-18 (15:16:55) java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
java.net.SocketOutputStream.write(SocketOutputStream.java:153)
org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124)
org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136)
org.apache.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:167)
org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)
org.apache.http.entity.mime.content.StringBody.writeTo(StringBody.java:174)
org.apache.http.entity.mime.AbstractMultipartForm.doWriteTo(AbstractMultipartForm.java:134)
org.apache.http.entity.mime.AbstractMultipartForm.writeTo(AbstractMultipartForm.java:157)
org.apache.http.entity.mime.MultipartFormEntity.writeTo(MultipartFormEntity.java:113)
org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160)
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
As @lachlancoin suggested, this might be a batching issue, possibly implemented in a way that works well with .fasta
, but not with .fastq
??
For specifics, I am using the current provisioning script (provision_species.sh
), directly calling Allen's bwa-http-docker
, and the dataflow command in the README.
The following modifications were made:
nano-stream1
dataflow_species
)asia-northeast-1c
)n1-standard-8
, it did not appear to help with issues, so I should change it back to n1-standard-4
but not yet.I wonder if this issue might have been solved previously but not yet committed to the main branch? Most of the commits there are about a week old or more, and these issues have been mentioned in #23 so @obsh and @Pseverin would have known about them for a while.
I think we'll make batch size configurable, to try smaller fastq batches with the aligner. Meanwhile you can try to decrease it in the code and recompile jar file. https://github.com/allenday/nanostream-dataflow/blob/master/NanostreamDataflowMain/src/main/java/com/google/allenday/nanostream/NanostreamApp.java#L54
Also there is a new build of allenday/bwa-http-docker:http
container available. It's not a performance improvement, just more correct error handling.
I agree, we just ran into the problem again with the EDTA sample. We'll try 100 and maybe 50 tomorrow, it'd be a good idea to pull the batch size out into an argument, since our builds seemed imperfect the last few times.
Have tried down to batch size 25, seems to slow down the entire pipeline, no firestore results generated after ~30 mins run time on alignment step. We got the 5 min error in the end and the whole thing had to be cancelled.
Also, it seems that once the 5 min pipeline occurs, the whole provisioning cluster needs to be restarted. If we only restart the dataflow, we would immediately get broken pipe
errors:
UPDATE: Nevermind, it seems restarting the provisioning cluster doesn't help either. It seems very random, sometimes works sometimes doesn't, even with exact same builds and fastq files. Occasionally also getting 404 errors
it'd be a good idea to pull the batch size out into an argument
done now, see optional - --alignmentBatchSize
parameter.
Have tried down to batch size 25, seems to slow down the entire pipeline, no firestore results generated after ~30 mins run time on alignment step.
I've experimented with batch size, looks that bigger batch size actually improves performance as in this case bwa starting time adds less overhead. Default value is 2000 as it worked well on "dogbite" dataset in my tests.
I assume that at least n1-highmem-8
machine size is required for aligner when using species reference database. With less memory it seems that OS buffer cache is not working, while withn1-highmem-8
bwa loading time improves significantly on subsequent calls.
Also in #95 we introduced optional --bwaArguments
parameter. With default value '-t 4' - bwa
now uses 4 threads. For n1-highmem-8
you can try even --bwaArguments='-t 8' for better aligner performance.
I am still having a problem with large fastq, see #98 (connection refused during alignment step). So basically the dataflow stores at the alignment step and nothing comes out of it. This fastq had 4000 records, and I set a batch size of 500 (and using the standard bwa docker). The scripts I use are here:
https://github.com/lachlancoin/gcloud/blob/master/init.sh
I set the target-cpu-utliisation to 0.5 (to manage costs!), and use default '-t 4' .
I was wondering, if its possible to avoid the CGI step, which is problematic by instead using Pubsub. I have some thoughts which I will put in a new issue.
I have tried increasing the provisioning
MACHINE_TYPE
ton1-standard-8
, which is 8 vCPUs and 30 GB RAM, should be more than enough for any of the reference files.Large files (>~100 kb?) still get stuck with these error logs. If these appear, the pipeline appears to be unsalvageable and need to be cancelled and restarted.
2019-02-18 (11:33:28) Processing stuck in step Alignment for at least 05m00s without outputting or completing in state pro...
2019-02-18 (11:38:28) Processing stuck in step Alignment for at least 10m00s without outputting or completing in state pro...
2019-02-18 (11:38:38) org.apache.http.client.ClientProtocolException: Unexpected response status: 502