geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
430 stars 489 forks source link

Too many open files #731

Closed fxprunayre closed 9 years ago

fxprunayre commented 9 years ago

I get quite often too many open files exceptions.

eg. while indexing catalog with 700 records


     Marking the metadata as _indexingError=1 in index
2015-01-25 12:30:13,138 ERROR [geonetwork.index] - Failed to properly index geometry of metadata 400. Error: /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/index/spatialindex.dbf (Too many open files)
java.io.FileNotFoundException: /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/index/spatialindex.dbf (Too many open files)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
    at org.geotools.data.shapefile.ShpFiles.getReadChannel(ShpFiles.java:831)
    at org.geotools.data.shapefile.dbf.DbaseFileReader.<init>(DbaseFileReader.java:139)
    at org.geotools.data.shapefile.dbf.IndexedDbaseFileReader.<init>(IndexedDbaseFileReader.java:149)
    at org.geotools.data.shapefile.indexed.IndexedShapefileDataStore.openDbfReader(IndexedShapefileDataStore.java:859)
    at org.geotools.data.shapefile.indexed.IndexedShapefileDataStore.getAttributesReader(IndexedShapefileDataStore.java:561)
    at org.geotools.data.shapefile.indexed.IndexedShapefileDataStore.createFeatureWriter(IndexedShapefileDataStore.java:965)
    at org.geotools.data.AbstractDataStore.getFeatureWriter(AbstractDataStore.java:470)
    at org.geotools.data.AbstractFeatureStore.removeFeatures(AbstractFeatureStore.java:361)
    at org.fao.geonet.ker

It also happens when using the catalog - viewing records.

jesseeichar commented 9 years ago

How do you get the metadata? do you harvest from a server?

fxprunayre commented 9 years ago

No it was on an existing db. But some of them may trigger exceptions on indexing - it may be related ? Will do more testing to identify the issue. I have observed this errors 7 times this week.

jesseeichar commented 9 years ago

I index 6500 metadata on geocat test and integration servers so your system is somehow different. we need to isolate what is different.

one thing we need to do is see what files are open. I suspsect they are the schema files but I would like verification. can you do the :

jps
lsof -p <pid> > openfiles.txt

And quickly check which files are open multiple times?

Then I will instrument the places I suspect that might provide links and will commit the instrumented code. Once done you can reindex your metadata and hopefully the instrumentation will lead us to the problem code.

Jesse

fxprunayre commented 9 years ago

Not sure if this can be related, but I have probably something wrong with the Requests table which trigger that exception.


2015-01-25 16:39:16,362 DEBUG [jeeves.config.springutil.JeevesDispatcherServlet] - Successfully completed request
org.springframework.dao.DataIntegrityViolationException: could not execute batch; SQL [insert into Requests (autogenerated, hits, ip, lang, query, type, requestdate, service, simple, sortby, spatialfilter, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [null]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute batch
    at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:643)
    at org.springframework.orm.jpa.vendor.HibernateJpaDialect.translateExceptionIfPossible(HibernateJpaDialect.java:106)
    at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:517)
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:755)
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:724)
    at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:475)
    at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:270)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:155)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.data.jpa.repository.support.LockModeRepositoryPostProcessor$LockModePopulatingMethodIntercceptor.invoke(LockModeRepositoryPostProcessor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
    at com.sun.proxy.$Proxy99.save(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor313.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.zeroturnaround.javarebel.integration.util.ReloadingProxyFactory$ReloadingMethodHandler.invoke(JRebel:74)
    at com.sun.proxy.$Proxy99.save(Unknown Source)
    at org.fao.geonet.kernel.search.log.QueryRequest.storeToDb(QueryRequest.java:268)
    at org.fao.geonet.kernel.search.log.SearcherLogger.logSearch(SearcherLogger.java:98)
    at org.fao.geonet.kernel.search.SearchLoggerTask.run(SearchLoggerTask.java:48)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.hibernate.exception.ConstraintViolationException: could not execute batch
    at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:129)
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:126)
    at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.performExecution(BatchingBatch.java:132)
    at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.doExecuteBatch(BatchingBatch.java:111)
    at org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl.execute(AbstractBatchImpl.java:163)
    at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.getBatch(JdbcCoordinatorImpl.java:215)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3102)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:3581)
    at org.hibernate.action.internal.EntityInsertAction.execute(EntityInsertAction.java:104)
    at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:463)
    at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:349)
    at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:350)
    at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:56)
    at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1222)
    at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:425)
    at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.beforeTransactionCommit(JdbcTransaction.java:101)
    at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.commit(AbstractTransactionImpl.java:177)
    at org.hibernate.jpa.internal.TransactionImpl.commit(TransactionImpl.java:77)
    at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:513)
    ... 25 more
Caused by: java.sql.BatchUpdateException: Batch entry 0 insert into Requests (autogenerated, hits, ip, lang, query, type, requestdate, service, simple, sortby, spatialfilter, id) values ('0', '974', '127.0.0.1', 'fre', '+(_op0:2 _op2:2 _op0:1 _op2:1 _op0:0 _op2:0 _op0:-1 _op2:-1 _owner:22 _dummy:0) +(_isTemplate:y _isTemplate:n _isTemplate:s)', 'TERM', '2015-01-25T16:39:16', 'qi', '0', '_docLocale ASC,null ASC', NULL, '9232') was aborted.  Call getNextException to see the cause.

It looks like index XSLTs are opened a lot

 ls -l /proc/24823/fd | cut -c55-200 | sort | uniq -c

Return

    688 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/codelist/external/thesauri/theme/inspire-theme.rdf
   1391 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/convert/functions.xsl
      2 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/index-fields/index-subtemplate-fields.xsl
    696 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/index-fields/index-subtemplate-fields.xsl
     43 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-fn.xsl
     41 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-fn.xsl
     42 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-tpl-multilingual.xsl
     42 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-tpl-multilingual.xsl
     42 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-tpl.xsl
     42 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/WEB-INF/data/config/schema_plugins/iso19139/layout/utility-tpl.xsl
     49 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/base-variables.xsl
     48 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/base-variables.xsl
     44 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/functions-core.xsl
     40 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/functions-core.xsl
     39 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/profiles-loader-relations.xsl
     45 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/profiles-loader-relations.xsl
     44 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/profiles-loader-tpl-brief.xsl
     40 -> /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/common/profiles-loader-tpl-brief.xsl
      2 > /home/francois/dev/eea-geonetwork/web/src/main/webapp/xslt/eea-layout.xsl

So maybe related to the oasis catalog resolver ?

jesseeichar commented 9 years ago

I have added 2 methods to IO (newInputStream and newBufferedReader) and made all calls of Files.newInputStream or newBufferedReader be changed to use those methods. In those methods I wrap the inputstream/reader with the Debugging equivalents (DebuggingReader and DebuggingInputStream).

These classes register an exception with the OpenResourceTracker class on creation and remove the exception on closing. Thus the OpenResourceTracker maintains a list of all the open resources (along with the file name) and when there are 1000 open files it will write to the log all of the open files and the stacktrace at the point when they were opened..

Now that I think about it, it will kill the system at that point because each time an resource is opened after 1000 then all open files will be logged repeatedly. essentially crashing the system. Maybe it should be changed to only print once.

If you get a linux error you can still:

jesseeichar commented 9 years ago

Just committed a fix for this issue. Please verify that it fixes the issue. The fix is 7257438dae9e41026784cc65919312a0f67029ac

It turned out that the parsing framework was calling both getInputStreamn and getReader on PathStreamSource which resulted in 2 streams being opened but only 1 was used and thus only 1 was closed. I made a test for this and verified that now the test passes.

I also made changes to the open resource tracking. It now will track resources when in development mode but not in production mode. In addition it will print an error when 1000, 2000, 3000 and 4000 resources are open. The error will consist of the first 100 open files and their stack traces.

jesseeichar commented 9 years ago

Reopening so that @fxprunayre can verify.

fxprunayre commented 9 years ago

Tested and it does not happen anymore. Thanks Jesse. Closing.