Closed afischer211 closed 9 years ago
How does pdrone ensure, that the consumer process (Tycho-build) receives a consistent artifact? I have other concurrently running processes (on other Jenkins-slave machines), which deploys artifacts into the channels on the same time. If the channel-aspect recalculates the P2-indexfiles or clean old artifacts, how is the transaction concept ensured?
Well the EofException indicates that the sender had trouble to sends its data. Looking at the root cause (broken pipe) it seems that the remote side has closed the connection (the receiving side). Which could, of course be, due to a timeout situation.
Package Drone has a read/write lock for each channel.
That is clear, but what is the reason for the read-timeout (the Tycho-build aborts after timeout and so pdrone can not send any datas to client)? Why needs pdrone sometimes such a long time for providing artifacts (other artifacts are provided quickly). Can I analyze the timing problems deeper? It seems, that pdrone needs sometime a big amount of time for providing an artifact, so that the Tycho-build aborts with read-timeout.
The linux-machine (CentOS7 with 64Bit) runs with a load-average of 2,25 Is this problem a tuning-task for the postgresql-database or a memory-bottleneck of pdrone itself (it runs currently with -Xmx5120m, the machine has 10GB of RAM and 2,5GB free RAM)?
This is indeed strange, and should not be the case!
What could be the case, is that due to the read/write lock, one build tries to "read" and the other to "write". If this write lock is granted first, all reads have to wait. This could delay the read and force the reader (Tycho) into a timeout situation.
I will check how this can be traced. But I cannot do it right now.
At the moment I have many running build-jobs.
In the console-log of pdrone I see many line like:
12:41:17.814 [qtp447351078-129] WARN d.d.pm.p2.internal.servlet.P2Servlet [P2Servlet.java:144] Download plugin: /thirdparty/plugins/org.apache.commons.codec/1.9.0/org.apache.commons.codec_1.9.0.jar
What does this warning-message mean?
This is actually not an "warning". I just lowered it to "debug".
I can see the long delay on downloading artifacts from pdrone sometimes also with browser. So it seems to be a temporary lock problem / race condition within pdrone.
Well, it is less a problem ;-) in the traditional sense. While the channel is updating, it is also locked for further read calls. If the modify calls take 2 minutes, then the read calls will also be blocked for this time.
It might also be, that a few sequential write operations, might stretch this period even longer.
I guess the only possible way around this, is to provide the "old" channel state for reading, while the new, modified, channel state will be prepared in parallel. I am currently working in the persistence layer, so maybe this is a good use case to add.
Yes, this would become more like transactional database (ACID-concept). So I see the old valid & consistent state until completition of the transaction. The existing (possible but not perfect) way is to block all other transactions - so I also ensure validity of read datas.
Well, it still is ACID right now. Just with a sub-optimal locking. Which on the other hand was necessary for the deadlock situation of Postgres and MySQL.
The whole database situation is quite annoying actually. There are a few quirks with JPA, OSGi, and JDBC in general. The file blob store, possible scenarios with S3, Hadoop etc ... so we are currently evaluating a model with does not require a database in the first place. This could be just another use case for that.
I have the following situation: 4 build-jobs, which deploys to different product-channels on pdrone. Now I try to download an artifact from another channel with thirdparty-libraries - it needs over 2min for getting the filechooser for selecting the targetfolder for download. So my question is, what blocks pdrone for over 2mins, if the build-jobs deploys to other channels. All channels have the cleanup-aspect together with p2, tycho ... Are the aspects multithreaded or blocks a running aspect also other channels?
My hope with pdrone was, that the usage of a database helps for concurrent access on the osgi-artifacts. The simple eclipse p2-repositories on filesystems behind a web-server works stable and quick, but have problems with concurrent access from different jobs (especially the index-files artifacts.xml and content.xml are critical).
You can see on my usage-statistics, that we have >10000 artifacts in pdrone distributed in 13 channels. The most problematic channels (seems to be 2) have over 2000 artifacts in itself. Is there a possible performance-problem with more than 2000 artifacts in a channel and activated cleanup- and p2/osgi-aspects?
One remark to the problem, which could be helpful: It needs over 2min to download an artifact over URL http://server:8080/p2/common/plugins/de.pcsoft.common.datasystem.base.api/3.10.3.201507271614/de.pcsoft.common.datasystem.base.api_3.10.3.201507271614.jar A download over URL http://server:8080/artifact/74E43A08-CC62-43C9-82BA-626708EB3C77/get runs quickly in the same time!
The first URL is used by Tycho for downloading an artifact, the second is used in the Web-GUI of PDrone.
Hm, ok .. this actually could be different situation. I will check into this.
Actually the P2 adapter has to resolve the artifact, and cannot reference to it. A limitation of the P2 mechanism, but maybe the resolving is not as performant as it could be.
I did upload a new pre-release [1] which uses a different, more efficient way, to list all artifacts in order for resolving the id/version combination of P2.
You should be able to switch back and forth between your current version and this one. And this is the only change. So if you can give it a try, this would help me a lot, since I currently don't have much time.
I have installed the new version, the build-jobs are running. A first impression looks good.
Do you see any improvement? I would make a new release then.
We can close the issue. Also with 4 concurrently running jobs (over last night) there are no read-problems anymore. Thanks for the quick solution. Can you also integrate the logging-patch (modified logging-level) in the next release?
So I will make a new release.
The 0.10.2 release should have had the log issue already fixed.
I have a running Package-Drone repository (0.10.2) with >10000 artifacts stored in different channels. The database is PostgreSQL 9.x
Sometimes I get errors on loading an OSGi-bundle from a channel over Tycho-build: [ERROR] An error occurred while transferring artifact canonical: osgi.bundle,, from repository /p2/ :
[ERROR] Unable to read repository at **/p2//plugins///.
[ERROR] Internal error: org.eclipse.tycho.repository.local.MirroringArtifactProvider$MirroringFailedException: Could not mirror artifact osgi.bundle,, into the local Maven repository.See log output for details. Read timed out -> [Help 1]
Why the read-access goes over the path with groupid and version? The bundle is located directly in the channels under p2/!?
On the pdrone-console I see the following log-output (the named bundle-jarfile is listed and downloadable in the channel-content-view!):/plugins///
11:08:03.518 [qtp447351078-124] WARN org.eclipse.jetty.server.HttpChannel
[HttpChannel.java:395] /p2/javax.servlet.ServletException: javax.servlet.ServletException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.Ru ntimeException: org.eclipse.jetty.io.EofException at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:88) ~[org.ops4j.pax.web.pax-web-jetty_ 4.1.2.jar:na] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.server.Server.handle(Server.java:497) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) [org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [org.eclipse.jetty.util_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) [org.eclipse.jetty.util_9.2.9.v20150224.jar:9.2.9.v20150224] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] Caused by: javax.servlet.ServletException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: org.e clipse.jetty.io.EofException at de.dentrassi.pm.p2.internal.servlet.P2Servlet.process(P2Servlet.java:201) ~[na:na] at de.dentrassi.pm.p2.internal.servlet.P2Servlet.doGet(P2Servlet.java:148) ~[na:na] at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) ~[javax.servlet-api_3.1.0.jar:3.1.0] at de.dentrassi.pm.p2.internal.servlet.P2Servlet.service(P2Servlet.java:73) ~[na:na] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api_3.1.0.jar:3.1.0] at de.dentrassi.osgi.web.servlet.ServletWrapper.service(ServletWrapper.java:64) ~[na:na] at de.dentrassi.osgi.web.servlet.JspServletInitializer$BundleServletWrapper.service(JspServletInitializer.java:53) ~[na:na] at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808) ~[org.eclipse.jetty.servlet_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587) ~[org.eclipse.jetty.servlet_9.2.9.v20150224.jar:9.2.9.v20150224] at org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70) ~[org.ops4j.pax.web.pax-web-jetty_4.1. 2.jar:na] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) ~[org.eclipse.jetty.security_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:267) ~[org.ops4j.pax.web.pax-web-jetty_4.1.2.jar:na] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) ~[org.eclipse.jetty.servlet_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[org.eclipse.jetty.server_9.2.9.v20150224.jar:9.2.9.v20150224] at org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:80) ~[org.ops4j.pax.web.pax-web-jetty_ 4.1.2.jar:na] ... 8 common frames omitted Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: org.eclipse.jetty.io.EofException at de.dentrassi.pm.common.service.AbstractJpaServiceImpl.doWithTransaction(AbstractJpaServiceImpl.java:124) ~[na:na] at de.dentrassi.pm.storage.service.jpa.StorageServiceImpl.streamArtifact(StorageServiceImpl.java:358) ~[na:na] at de.dentrassi.pm.storage.service.jpa.ArtifactImpl.streamData(ArtifactImpl.java:86) ~[na:na] at de.dentrassi.pm.storage.service.util.DownloadHelper.streamArtifact(DownloadHelper.java:64) ~[na:na] at de.dentrassi.pm.p2.internal.servlet.DownloadHandler.process(DownloadHandler.java:97) ~[na:na] at de.dentrassi.pm.p2.internal.servlet.P2Servlet.process(P2Servlet.java:197) ~[na:na] ... 27 common frames omitted Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: org.eclipse.jetty.io.EofException at de.dentrassi.pm.common.service.AbstractJpaServiceImpl.lambda$1(AbstractJpaServiceImpl.java:113) ~[na:na] at de.dentrassi.pm.common.service.AbstractJpaServiceImpl$$Lambda$19/884281141.process(Unknown Source) ~[na:na] at de.dentrassi.pm.common.service.AbstractJpaServiceImpl.doWithManager(AbstractJpaServiceImpl.java:48) ~[na:na] at de.dentrassi.pm.common.service.AbstractJpaServiceImpl.doWithTransaction(AbstractJpaServiceImpl.java:68) ~[na:na] ... 32 common frames omitted Caused by: java.lang.RuntimeException: java.lang.RuntimeException: org.eclipse.jetty.io.EofException at de.dentrassi.pm.storage.service.jpa.blob.FilesystemBlobStoreProcessor.streamArtifact(FilesystemBlobStoreProcessor.java:207) ~[na:na] at de.dentrassi.pm.storage.service.jpa.blob.BlobStore.streamArtifact(BlobStore.java:193) ~[na:na] at de.dentrassi.pm.storage.service.jpa.blob.BlobStore.streamArtifact(BlobStore.java:183) ~[na:na] at de.dentrassi.pm.storage.service.jpa.StorageServiceImpl.lambda$14(StorageServiceImpl.java:367) ~[na:na] at de.dentrassi.pm.storage.service.jpa.StorageServiceImpl$$Lambda$112/878977731.process(Unknown Source) ~[na:na] at de.dentrassi.pm.common.service.AbstractJpaServiceImpl.lambda$1(AbstractJpaServiceImpl.java:80) ~[na:na] ... 35 common frames omitted Caused by: java.lang.RuntimeException: org.eclipse.jetty.io.EofException at de.dentrassi.pm.storage.service.util.DownloadHelper.lambda$0(DownloadHelper.java:88) ~[na:na] at de.dentrassi.pm.storage.service.util.DownloadHelper$$Lambda$111/2059938070.receive(Unknown Source) ~[na:na] at de.dentrassi.pm.storage.service.jpa.blob.BlobStore.lambda$1(BlobStore.java:183) ~[na:na] at de.dentrassi.pm.storage.service.jpa.blob.BlobStore$$Lambda$113/1554537986.accept(Unknown Source) ~[na:na] at de.dentrassi.pm.storage.service.jpa.blob.FilesystemBlobStoreProcessor.streamArtifact(FilesystemBlobStoreProcessor.java:199) ~[na:na] ... 40 common frames omitted Caused by: org.eclipse.jetty.io.EofException: null at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:192) ~[org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:408) ~[org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:364) ~[org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.io.SelectChannelEndPoint.onSelected(SelectChannelEndPoint.java:111) ~[org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processKey(SelectorManager.java:636) ~[org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.io.SelectorManager$ManagedSelector.select(SelectorManager.java:607) ~[org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.io.SelectorManager$ManagedSelector.run(SelectorManager.java:545) ~[org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] at org.eclipse.jetty.util.thread.NonBlockingThread.run(NonBlockingThread.java:52) ~[org.eclipse.jetty.util_9.2.9.v20150224.jar:9.2.9.v20150224] ... 3 common frames omitted Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_45] at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_45] at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_45] at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_45] at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_45] at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:170) ~[org.eclipse.jetty.io_9.2.9.v20150224.jar:9.2.9.v20150224] ... 10 common frames omitted
Other bundles can loaded without problems. I have cleaned and refilled the channels before because of assuming some inconsistencies with the local storage or so. This does not help.
Are this race-conditions or problems with concurrent access to channels over p2/tycho?