Closed LucasRoesler closed 2 years ago
Hi @LucasRoesler. If you've had your DataHub instance for some time, you may need to update the DB character encoding - Can you see if following this FAQ works for you? https://datahubproject.io/docs/debugging/#im-seeing-exceptions-in-datahub-gms-container-like-caused-by-javalangillegalstateexception-duplicate-key-comlinkedinmetadataentityebeanebeanaspectv2dd26e011-what-do-i-do
Thanks for reporting this with a reproducible setup. The issue has been fixed in acryl-datahub==0.8.26.8
. Please re-open if it persists.
Describe the bug When I try to ingest dbt metadata, I get an error that the ingestion failed with a backoff. Looking into the stack trace I see
DuplicateKeyException
Oddly, I can get the ingestion to work in one of two ways
disable_dbt_node_creation: True
and then reingest the same dbt data withdisable_dbt_node_creation: False
To Reproduce
Because the reproduction is a bit complex, I have created a complete reproduction repo here https://github.com/LucasRoesler/dbt-datahub-ingestion-bug
I have included the exception text here
``` 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: ' 'com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries\n' '\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n' '\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)\n' '\tat com.linkedin.metadata.resources.entity.EntityResource.ingest(EntityResource.java:178)\n' '\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n' '\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n' '\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n' '\tat java.lang.reflect.Method.invoke(Method.java:498)\n' '\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:172)\n' '\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:326)\n' '\tat ' 'com.linkedin.restli.internal.server.filter.FilterChainDispatcherImpl.onRequestSuccess(FilterChainDispatcherImpl.java:47)\n' '\tat ' 'com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:86)\n' '\tat ' 'com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.lambda$onRequest$0(RestLiFilterChainIterator.java:73)\n' '\tat java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)\n' '\tat java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683)\n' '\tat java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010)\n' '\tat ' 'com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:72)\n' '\tat com.linkedin.restli.internal.server.filter.RestLiFilterChain.onRequest(RestLiFilterChain.java:55)\n' '\tat com.linkedin.restli.server.BaseRestLiServer.handleResourceRequest(BaseRestLiServer.java:218)\n' '\tat ' 'com.linkedin.restli.server.RestRestLiServer.handleResourceRequestWithRestLiResponse(RestRestLiServer.java:242)\n' '\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:211)\n' '\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:181)\n' '\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n' '\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n' '\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n' '\tat ' 'com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n' '\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n' '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n' '\tat ' 'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n' '\tat ' 'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n' '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n' '\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n' '\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n' '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n' '\tat ' 'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n' '\tat ' 'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n' '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n' '\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n' '\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n' '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n' '\tat ' 'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n' '\tat ' 'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n' '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n' '\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n' '\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n' '\tat ' 'com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n' '\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n' '\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n' '\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n' '\tat ' 'com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:63)\n' '\tat ' 'org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n' '\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n' '\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n' '\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)\n' '\tat com.datahub.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:77)\n' '\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)\n' '\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:542)\n' '\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n' '\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n' '\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n' '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n' '\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n' '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n' '\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n' '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n' '\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n' '\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n' '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n' '\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n' '\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n' '\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n' '\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n' '\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n' '\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n' '\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n' '\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n' '\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n' '\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n' '\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n' '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n' '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n' '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n' '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n' '\tat ' 'org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n' '\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n' '\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n' '\tat java.lang.Thread.run(Thread.java:748)\n' 'Caused by: com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries\n' '\tat com.linkedin.metadata.entity.ebean.EbeanAspectDao.runInTransactionWithRetry(EbeanAspectDao.java:458)\n' '\tat ' 'com.linkedin.metadata.entity.ebean.EbeanEntityService.ingestAspectsToLocalDB(EbeanEntityService.java:326)\n' '\tat com.linkedin.metadata.entity.EntityService.wrappedIngestAspectsToLocalDB(EntityService.java:239)\n' '\tat com.linkedin.metadata.entity.EntityService.ingestAspects(EntityService.java:291)\n' '\tat com.linkedin.metadata.entity.EntityService.ingestSnapshotUnion(EntityService.java:703)\n' '\tat com.linkedin.metadata.entity.EntityService.ingestEntity(EntityService.java:606)\n' '\tat com.linkedin.metadata.resources.entity.EntityResource.lambda$ingest$4(EntityResource.java:179)\n' '\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)\n' '\t... 85 more\n' 'Caused by: io.ebean.DuplicateKeyException: Error when batch flush on sql: insert into metadata_aspect_v2 ' '(urn, aspect, version, metadata, createdOn, createdBy, createdFor, systemmetadata) values (?,?,?,?,?,?,?,?)\n' '\tat io.ebean.config.dbplatform.SqlCodeTranslator.translate(SqlCodeTranslator.java:46)\n' '\tat io.ebean.config.dbplatform.DatabasePlatform.translate(DatabasePlatform.java:219)\n' '\tat io.ebeaninternal.server.transaction.TransactionManager.translate(TransactionManager.java:246)\n' '\tat io.ebeaninternal.server.transaction.JdbcTransaction.translate(JdbcTransaction.java:698)\n' '\tat io.ebeaninternal.server.transaction.JdbcTransaction.batchFlush(JdbcTransaction.java:680)\n' '\tat io.ebeaninternal.server.transaction.JdbcTransaction.internalBatchFlush(JdbcTransaction.java:796)\n' '\tat io.ebeaninternal.server.transaction.JdbcTransaction.flushCommitAndNotify(JdbcTransaction.java:1005)\n' '\tat io.ebeaninternal.server.transaction.JdbcTransaction.commit(JdbcTransaction.java:1057)\n' '\tat com.linkedin.metadata.entity.ebean.EbeanAspectDao.runInTransactionWithRetry(EbeanAspectDao.java:449)\n' '\t... 92 more\n' 'Caused by: java.sql.BatchUpdateException: Duplicate entry ' "'urn:li:dataset:(urn:li:dataPlatform:snowflake,citibike_tripdata.' for key 'PRIMARY'\n" '\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n' '\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n' '\tat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\n' '\tat java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n' '\tat com.mysql.cj.util.Util.handleNewInstance(Util.java:192)\n' '\tat com.mysql.cj.util.Util.getInstance(Util.java:167)\n' '\tat com.mysql.cj.util.Util.getInstance(Util.java:174)\n' '\tat com.mysql.cj.jdbc.exceptions.SQLError.createBatchUpdateException(SQLError.java:224)\n' '\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchSerially(ClientPreparedStatement.java:853)\n' '\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)\n' '\tat com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:796)\n' '\tat ' 'io.ebean.datasource.delegate.PreparedStatementDelegator.executeBatch(PreparedStatementDelegator.java:357)\n' '\tat io.ebeaninternal.server.persist.BatchedPstmt.executeAndCheckRowCounts(BatchedPstmt.java:130)\n' '\tat io.ebeaninternal.server.persist.BatchedPstmt.executeBatch(BatchedPstmt.java:97)\n' '\tat io.ebeaninternal.server.persist.BatchedPstmtHolder.flush(BatchedPstmtHolder.java:124)\n' '\tat io.ebeaninternal.server.persist.BatchControl.flushPstmtHolder(BatchControl.java:206)\n' '\tat io.ebeaninternal.server.persist.BatchControl.executeNow(BatchControl.java:220)\n' '\tat io.ebeaninternal.server.persist.BatchedBeanHolder.executeNow(BatchedBeanHolder.java:95)\n' '\tat io.ebeaninternal.server.persist.BatchControl.flush(BatchControl.java:271)\n' '\tat io.ebeaninternal.server.persist.BatchControl.flush(BatchControl.java:227)\n' '\tat io.ebeaninternal.server.transaction.JdbcTransaction.batchFlush(JdbcTransaction.java:678)\n' '\t... 96 more\n' 'Caused by: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry ' "'urn:li:dataset:(urn:li:dataPlatform:snowflake,citibike_tripdata.' for key 'PRIMARY'\n" '\tat com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:117)\n' '\tat com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)\n' '\tat com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)\n' '\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)\n' '\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1092)\n' '\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchSerially(ClientPreparedStatement.java:832)\n' '\t... 108 more\n' ```Expected behavior The ingestion should correctly create both the postgres and the dbt nodes without any prerequisite steps
Desktop (please complete the following information):
Additional context Add any other context about the problem here.