COSIMA / master_index

Scripts to generate COSIMA master index
0 stars 0 forks source link

Slow indexing causing access issues #22

Closed aidanheerdegen closed 1 year ago

aidanheerdegen commented 2 years ago

With new experiments with large numbers of files (140K+) the indexing is taking so long that it impacts users who can't access it for their analysis, typically they get messages like:

OperationalError: (sqlite3.OperationalError) attempt to write a readonly database
[SQL: PRAGMA user_version]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

when creating a session with

cc.database.create_session()
angus-g commented 2 years ago

Would it make sense to index to some temporary location, and then only move the database into place when the indexing is done? At least as a for-the-moment measure...

aidanheerdegen commented 2 years ago

I'm putting a time limit on the indexing to begin with.

Indexing to a temporary location is a good idea, but there are some mechanics to make sure it will work correctly without corrupting the DB. The docs say this is only an issue if there is a write operation in progress, and users aren't writing to it, but the OperationalError error exception is saying attempt to write a readonly database, which is strange.

I've just started looking into merging SQLite DBs as solution to https://github.com/COSIMA/master_index/issues/16 but that maybe has some applicability to this as well. Not only faster, but the offline time could be dramatically reduced.

I seem to recall you talking about something like a merge at one stage, so happy to have any input on that front.

micaeljtoliveira commented 1 year ago

@aidanheerdegen Shall we close this issue? I think this is not a problem anymore with the offline indexing.

aidanheerdegen commented 1 year ago

Sure.