IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 494 forks source link

After updating Dataverse from 6.2 to 6.4, the ingestion of tabular data is no longer functioning correctly on our installation #11021

Closed faborg closed 1 day ago

faborg commented 1 week ago

After updating Dataverse from 6.2 to 6.4, the ingestion of tabular data is no longer functioning correctly. In version 6.2, it still worked with the same files.

I would be very grateful for any help!

Error message on the web interface:

Ingest was unsuccessful. Ingest succeeded, but failed to save the ingested tabular data in the database: Index 1 out of bounds for length

Error messages in the log:


[2024-11-13T12:48:51.666+0100] [Payara 6.2024.6] [INFORMATION] [] [edu.harvard.iq.dataverse.util.FileUtil] [tid: _ThreadID=76 _ThreadName=http-thread-pool::http-listener-1(3)] [timeMillis: 1731498531666] [levelValue: 800] [[ daten.csv is a filename/extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileName.properties file.]]

[2024-11-13T12:51:22.349+0100] [Payara 6.2024.6] [WARNUNG] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=17540 _ThreadName=orb-thread-pool-1 (pool #1): worker-5] [timeMillis: 1731498682349] [levelValue: 900] [[ Ingest failure (IO Exception): Could not parse Excel/XLSX spreadsheet. Cannot invoke "String.indexOf(int)" because "spansAttribute" is null.]]


[2024-11-13T12:57:27.165+0100] [Payara 6.2024.6] [WARNUNG] [] [jakarta.enterprise.ejb.container] [tid: _ThreadID=4077 _ThreadName=orb-thread-pool-1 (pool #1): worker-2] [timeMillis: 1731499047165] [levelValue: 900] [[ jakarta.ejb.EJBException: Index 1 out of bounds for length 1 at com.sun.ejb.containers.EJBContainerTransactionManager.processSystemException(EJBContainerTransactionManager.java:723) at com.sun.ejb.containers.EJBContainerTransactionManager.completeNewTx(EJBContainerTransactionManager.java:652) at com.sun.ejb.containers.EJBContainerTransactionManager.postInvokeTx(EJBContainerTransactionManager.java:482) at com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4601) at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2134) at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2104) at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:220) at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90) at jdk.proxy74/jdk.proxy74.$Proxy311.ingestAsTabular(Unknown Source) at edu.harvard.iq.dataverse.ingest.__EJB31_Generated__IngestServiceBean__Intf____Bean__.ingestAsTabular(Unknown Source) at edu.harvard.iq.dataverse.ingest.IngestMessageBean.onMessage(IngestMessageBean.java:107) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588) at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408) at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835) at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:653) at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834) at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:603) at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:81) at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52) at jdk.internal.reflect.GeneratedMethodAccessor204.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888) at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375) at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807) at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795) at org.glassfish.ejb.mdb.MessageBeanContainer.deliverMessage(MessageBeanContainer.java:1216) at org.glassfish.ejb.mdb.MessageBeanListenerImpl.deliverMessage(MessageBeanListenerImpl.java:131) at com.sun.enterprise.connectors.inbound.MessageEndpointInvocationHandler.invoke(MessageEndpointInvocationHandler.java:171) at jdk.proxy74/jdk.proxy74.$Proxy559.onMessage(Unknown Source) at com.sun.messaging.jms.ra.OnMessageRunner.run(OnMessageRunner.java:242) at com.sun.enterprise.connectors.work.OneWork.doWork(OneWork.java:108) at com.sun.corba.ee.impl.threadpool.ThreadPoolImpl$TaskRunner.run(ThreadPoolImpl.java:193) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at org.dataverse.unf.RoundRoutines.calcMantissa(RoundRoutines.java:399) at org.dataverse.unf.RoundRoutines.Genround(RoundRoutines.java:322) at org.dataverse.unf.UnfNumber.UNF5(UnfNumber.java:272) at org.dataverse.unf.UnfNumber.RUNF5(UnfNumber.java:215) at org.dataverse.unf.UnfDigest.unfV(UnfDigest.java:376) at org.dataverse.unf.UnfDigest.unf(UnfDigest.java:236) at org.dataverse.unf.UNFUtil.calculateUNF(UNFUtil.java:426) at edu.harvard.iq.dataverse.ingest.IngestServiceBean.calculateUNF(IngestServiceBean.java:1942) at edu.harvard.iq.dataverse.ingest.IngestServiceBean.produceContinuousSummaryStatistics(IngestServiceBean.java:757) at edu.harvard.iq.dataverse.ingest.IngestServiceBean.produceSummaryStatistics(IngestServiceBean.java:722) at edu.harvard.iq.dataverse.ingest.IngestServiceBean.ingestAsTabular(IngestServiceBean.java:1096) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588) at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408) at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835) at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:653) at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834) at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:603) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140) at jdk.internal.reflect.GeneratedMethodAccessor215.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888) at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833) at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:603) at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72) at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52) at jdk.internal.reflect.GeneratedMethodAccessor204.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888) at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833) at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375) at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807) at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795) at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212) ... 36 more

qqmyers commented 6 days ago

I don’t think there’s been any relevant change in the code. From the log it looks like this file has been given the Excel MIMEtype rather than text/csv. Dataverse will trust the source (browser or whatever calls the API) if it sends a MIMEtype, so I’d suggest checking there for the initial problem.

To repair this one, you could try using the File Redetect APIhttps://guides.dataverse.org/en/latest/api/native-api.html#redetect-file-type. Or try re-uploading from a different browser, etc. The log also notes that we (strangely) don’t have the .csv extension in our /dataverse/src/main/java/propertyFiles/MimeTypeDetectionByFileExtension.properties file to map it to text/csv. That could be added if you find that the redetect api doesn’t work for your case.

Hope that helps, -- Jim

faborg commented 1 day ago

After extended debugging, I have now found a solution: I changed the system locale from de_DE.UTF-8 to en_US.UTF-8. This allowed the ingest to work, as the CSV file delimiters are likely interpreted differently. The issue can therefore be closed from my side.

pdurbin commented 22 hours ago

Ah, ok, sounds like it's related to these issues, then: