ajslater / codex

Codex is a web based comic archive browser and reader
GNU General Public License v3.0
202 stars 6 forks source link

Full Scan Every Night? #217

Closed potter-jason closed 1 year ago

potter-jason commented 1 year ago

For the last few nights, I've been hearing my hard drive array get super loud from the fan throttling. I check and its Codex at 100% reindexing 30,000 comic books. Great!, but its doing the same thing every night around the same time. Is this on purpose? Can I schedule the scan to happen less often? There were no changes to the library either. docker version OPDS http://0.0.0.0:9810/opds/v1.2/r/0/1 repository codex v1.0.3

ajslater commented 1 year ago

Do the logs say it's "rebuilding" the index or "updating" the index? What it should be doing is "updating" where it checks to see if any comics are newer than the index and then add only those to to the index. Usually this happens just after the comics are imported so this is a very quick operation that does nothing.

potter-jason commented 1 year ago

01/07/2023 12:15:53 AM 2023-01-07 08:15:53 UTC INFO Read tags from 9662/33130 comics 01/07/2023 12:15:59 AM

2023-01-07 08:15:59 UTC INFO Read tags from 9674/33130 comics 01/07/2023 12:16:04 AM 2023-01-07 08:16:04 UTC INFO Read tags from 9711/33130 comics 01/07/2023 12:16:09 AM 2023-01-07 08:16:09 UTC INFO Read tags from 9745/33130 comics 01/07/2023 12:16:14 AM 2023-01-07 08:16:14 UTC INFO Read tags from 9787/33130 comics 01/07/2023 12:16:19 AM 2023-01-07 08:16:19 UTC INFO Read tags from 9819/33130 comics

potter-jason commented 1 year ago

The above is from the logs tonight, the previous night's logs have the same "Read Tags" lines

ajslater commented 1 year ago

oh i see. not the search index, but the poller and updater. Yeah, it's definitely not supposed to be doing that. What it is doing is polling every X hours as controlled by the fields in the admin/libraries page. Normally this polling looks at the comics sees that nothing has changed and does nothing. What's happening here is that for some reason it's seeing the comics as new every night.

So the first thing we can do is a simple workaround to prevent the nightly full scan. Go to /admin/library and click "edit" for the offending library. Uncheck "Poll Filesystem" and save. That will stop all scheduled polling.

However this doesn't really solve the problem and requires either the filesystem watcher to work or you manually clicking the poll button to add comics. And I suspect that with whatever's going wrong clicking the poll button will do a full update again rather than the intended partial diff update.

How is the filesystem with the comics mounted so that codex sees it? is codex in a docker container? is this a network filesystem? Could another process be touching the comics and updating their last updated time on the disk?

potter-jason commented 1 year ago

Codex is in a docker container. The comics are store across 3 drives connected via a USB drive array. Based on what you said, I'm thinking the following might be the issue. I'm using MergerFS a union file system that basically takes the 3 drives and makes them look like a single volume. A

ajslater commented 1 year ago

Yeah, this is it. The issue is that the directory watching library i use wants to compare inodes and your merging fs is probably coming up with new ones all the time, so they differ from the ones stored in the database.

I'm going to take a look and see if this is strictly necessary. Perhaps i can rely solely on name, mode, size & mtime.

ajslater commented 1 year ago

Codex v1.1.0 fixed a bug that may solve this issue.

potter-jason commented 1 year ago

Confirmed, im not experiencing this any more. Thanks for your hard work on this. Codex is my favorite Comic Book Server.