Sitecore / docker-images

Build docker images for Sitecore
MIT License
179 stars 220 forks source link

Cannot rebuild index after docker-compose down when using process isolation #182

Closed Barsonax closed 4 years ago

Barsonax commented 4 years ago

Since I switched to these images iam getting errors when iam trying to rebuild the index (to be more precise at the end of the rebuild). This didn't happen before when I used https://github.com/avivasolutionsnl/sitecore-docker.

Restarting and clearing the data folder seems to help.

Info:

Steps:

  1. Use this compose file:
    
    version: '2.4'

services:

sql: image: ${REGISTRY}sitecore-xm-sqldev:${SITECORE_VERSION}-windowsservercore-${WINDOWSSERVERCORE_VERSION} isolation: process volumes:

networks: default: external: name: nat

2. run docker-compose up
3. go to http://sitecore/sitecore
4. run docker-compose down
5. run docker-compose up
6. try populating and then rebuilding the indexes and you will get the error.

Error:

Job started: Index_Update_IndexName=sitecore_master_index|#Exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> SolrNet.Exceptions.SolrConnectionException: <?xml version="1.0" encoding="UTF-8"?>

500 9 C:\data\sitecore_master_index\data\index\pending_segments_n -> C:\data\sitecore_master_index\data\index\segments_n java.nio.file.NoSuchFileException: C:\data\sitecore_master_index\data\index\pending_segments_n -> C:\data\sitecore_master_index\data\index\segments_n at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:79) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) at java.nio.file.Files.move(Files.java:1395) at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) at org.apache.lucene.store.NRTCachingDirectory.rename(NRTCachingDirectory.java:168) at org.apache.lucene.store.LockValidatingDirectoryWrapper.rename(LockValidatingDirectoryWrapper.java:56) at org.apache.lucene.index.SegmentInfos.finishCommit(SegmentInfos.java:805) at org.apache.lucene.index.IndexWriter.finishCommit(IndexWriter.java:3497) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3464) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3421) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:676) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:93) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1959) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1935) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:281) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:188) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680) at java.lang.Thread.run(Thread.java:748) 500

---> System.Net.WebException: The remote server returned an error: (500) Internal Server Error. at System.Net.HttpWebRequest.GetResponse() at HttpWebAdapters.Adapters.HttpWebRequestAdapter.GetResponse() at SolrNet.Impl.SolrConnection.GetResponse(IHttpWebRequest request) at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable1 parameters) --- End of inner exception stack trace --- at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable1 parameters) at SolrNet.Impl.SolrConnection.Post(String relativeUrl, String s) at SolrNet.Impl.LowLevelSolrServer.SendAndParseHeader(ISolrCommand cmd) at Sitecore.ContentSearch.SolrProvider.SolrSearchIndex.PerformRebuild(Boolean resetIndex, Boolean optimizeOnComplete, IndexingOptions indexingOptions, CancellationToken cancellationToken) at Sitecore.ContentSearch.SolrProvider.SolrSearchIndex.Rebuild(Boolean resetIndex, Boolean optimizeOnComplete) --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor) at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Sitecore.Reflection.ReflectionUtil.InvokeMethod(MethodInfo method, Object[] parameters, Object obj) at Sitecore.Jobs.JobRunner.RunMethod(JobArgs args) at (Object , Object ) at Sitecore.Pipelines.CorePipeline.Run(PipelineArgs args) at Sitecore.Pipelines.DefaultCorePipelineManager.Run(String pipelineName, PipelineArgs args, String pipelineDomain, Boolean failIfNotExists) at Sitecore.Pipelines.DefaultCorePipelineManager.Run(String pipelineName, PipelineArgs args, String pipelineDomain) at Sitecore.Jobs.DefaultJob.DoExecute() at Sitecore.Abstractions.BaseJob.ThreadEntry(Object state)

pbering commented 4 years ago

Quick question: Did you clear you data folder for solr so you get fresh empty cores at startup?

Barsonax commented 4 years ago

Yes that does fix it but I would like not having to do that everytime.

pbering commented 4 years ago

It should not be needed to do everytime, only once. So you are saying that after you do that, populate and reindex, then I continues to happen?

joostmeijles commented 4 years ago

The seed condition differs for Aviva repo vs Sitecore:

https://github.com/Sitecore/docker-images/blob/561a6c6619e50b6d58d6e0317b991242ed5dee50/windows/9.x.x/sitecore-xp-solr/Boot.cmd#L13

vs

https://github.com/avivasolutionsnl/sitecore-docker/blob/224c4a5926a606a83e7c08109cad73db18feb5ec/xp/solr/Boot.cmd#L13

@Barsonax what is the prefix of your cores?

Barsonax commented 4 years ago

It should not be needed to do everytime, only once. So you are saying that after you do that, populate and reindex, then I continues to happen?

No I havent yet figured out how to reproduce this consistently but its does happen frequently.

@Barsonax what is the prefix of your cores?

sitecore_ so I think this condition is wrong image

Barsonax commented 4 years ago

I think this is reproducible with the following steps:

  1. run docker-compose up
  2. make sure the cores are seeded
  3. run docker-compose down
  4. run docker-compose up
  5. try rebuilding the indexes and you will get the error
joostmeijles commented 4 years ago

So we can fix it by renaming sc_ to sitecore_: https://github.com/Sitecore/docker-images/pull/183

I noticed that the Linux images use sc_ as prefix, but do not check for prefix in the boot script: https://github.com/Sitecore/docker-images/blob/561a6c6619e50b6d58d6e0317b991242ed5dee50/linux/9.2.0/sitecore-xm-solr/boot.sh#L7

Maybe we could remove the complete directory check, and only check for solr.xml like is done for Linux images.

@pbering @Barsonax what do you think?

Barsonax commented 4 years ago

This would make the behavior more consistent over the different images so I agree with this change.

pbering commented 4 years ago

I agree!

Barsonax commented 4 years ago

Hmm the error still seems to occur on commit ec563b00354ec1d789487ad459607c8c2042d351

@joostmeijles

EDIT: nvm checked the container and the check is still there which probably means my build cache needs to be cleared.

EDIT: still getting the error...

Barsonax commented 4 years ago

Even with the new images I can still reproduce this issue after a docker-compose down so this issue is not fixed @joostmeijles @pbering

Barsonax commented 4 years ago

Bug seems to be related to process isolation. Not yet sure why though but I cannot seem to reproduce this with hyper v isolation.

Barsonax commented 4 years ago

Iam unable to reproduce this anymore. My guess is that Microsoft found the bug already and patched it recently.

perosb commented 3 years ago

Still happening for us.

Windows 10 2004 and 2009; docker 20.x Not sure what Microsoft would have updated?

Also tried with updated java15 but no success.

cassidydotdk commented 3 years ago

Change the solr service from process to hyperv.

https://sitecore.stackexchange.com/questions/28782/solr-intermittently-failing-with-java-nio-file-nosuchfileexception/28786#28786

perosb commented 3 years ago

yes that's suggested workaround above but not the proper solution but probably not on sitecore plate. my comment was with regards to that MS would have updated/fixed this somewhere.

cassidydotdk commented 3 years ago

It may not be Sitecore's problem to fix, but when 10.1 ships with a recommended setup that includes process isolation for the solr service, at the very least the default recommendation should be reconsidered.