CoatiSoftware / Sourcetrail

Sourcetrail - free and open-source interactive source explorer
https://www.sourcetrail.com/
GNU General Public License v3.0
14.94k stars 1.41k forks source link

Indexing never ending #726

Open pwillard2 opened 5 years ago

pwillard2 commented 5 years ago

log_2019-08-28_18-34-15.txt.gz

Yesterday, around 18:30 I started a sourcetrail indexing: sourcetrail index --full r9.2.0-bryan.srctrlprj > r9.2.0-bryan.sourcetrail.log And this morning (around 8:20) it was still running. I had to kill sourcetrail processes to make it stop. Usually it does not take more than 1 or 2 hours.

I attached the log file. Files grew quite large: -rw-r--r-- 1 pwillard domain users 7154 Aug 21 16:13 r9.2.0-bryan.srctrlprj -rw-r--r-- 1 pwillard domain users 1619966976 Aug 23 20:43 r9.2.0-bryan.srctrldb -rw-r--r-- 1 pwillard domain users 9216 Aug 23 20:43 r9.2.0-bryan.srctrlbm -rw-r--r-- 1 pwillard domain users 37855430 Aug 28 18:34 r9.2.0-bryan.cdb.fixed.json -rw-r--r-- 1 pwillard domain users 2592408576 Aug 29 08:14 r9.2.0-bryan.srctrldb_tmp -rw-r--r-- 1 pwillard domain users 8768 Aug 29 08:18 r9.2.0-bryan.srctrldb_tmp-journal -rw-r--r-- 1 pwillard domain users 48177336 Aug 29 08:20 r9.2.0-bryan.sourcetrail.log

sourcetrail --version Sourcetrail Version 2019.3.46

pwillard2 commented 5 years ago

Forgot to mention I am running Centos 7.

egraether commented 5 years ago

The log file shows that most of the time is spent starting indexer processes, that don't do anything then shut themselves down.

Since the new release didn't have any major changes in the indexer, I suppose this is an issue with your shared memory state, and might be gone after a system reboot.

Otherwise: Is this reproducible with other projects? Is it reproducible using the GUI? How many files were actually indexed, and how many were supposed to?

pwillard2 commented 5 years ago

I will try again, and let you know. so far, this happened only once. I don't know 'How many files were actually indexed' how can you tell? the log file should tell you? regards Pierre


From: Eberhard Gräther notifications@github.com Sent: Friday, August 30, 2019 1:29 AM To: CoatiSoftware/SourcetrailBugTracker SourcetrailBugTracker@noreply.github.com Cc: Pierre Willard pwillard@roku.com; Author author@noreply.github.com Subject: Re: [CoatiSoftware/SourcetrailBugTracker] Indexing never ending (#726)

The log file shows that most of the time is spent starting indexer processes, that don't do anything then shut themselves down.

Since the new release didn't have any major changes in the indexer, I suppose this is an issue with your shared memory state, and might be gone after a system reboot.

Otherwise: Is this reproducible with other projects? Is it reproducible using the GUI? How many files were actually indexed, and how many were supposed to?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CoatiSoftware/SourcetrailBugTracker/issues/726?email_source=notifications&email_token=AJHQJHWCWTNDKFPN7X5DGD3QHDK7HA5CNFSM4ISEENU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5Q65GQ#issuecomment-526511770, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJHQJHT3FHLASCKRWTETX4LQHDK7HANCNFSM4ISEENUQ.

pwillard2 commented 5 years ago

log_2019-09-06_05-32-45.txt.gz

The issue reproduced again. yesterday I started a script to index 6 projects. At the 5th project, sourcetrail got somehow stuck. see attached log. regards Pierre

This morning sourcetrail processes:

ps -auxww | grep -i sourcetrail pwillard 52859 0.0 0.0 113172 1208 pts/3 S+ 05:32 0:00 /bin/sh -c sourcetrail index --full main-bryan.srctrlprj > main-bryan.sourcetrail.log pwillard 52860 122 6.7 3749936 2203740 pts/3 SLl+ 05:32 203:46 /opt/sourcetrail/Sourcetrail index --full main-bryan.srctrlprj pwillard 76129 61.0 0.0 381424 28084 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 8 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76130 41.0 0.1 389880 35840 pts/3 Rl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 10 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76133 26.0 0.0 376452 14792 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 9 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76136 25.0 0.0 376456 14796 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 11 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76138 34.0 0.0 310916 14764 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 2 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76141 0.0 0.0 312036 13596 pts/3 Rl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 3 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76142 0.0 0.0 308396 12012 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 5 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76144 0.0 0.0 308396 12016 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 4 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76149 0.0 0.0 308396 12016 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 1 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76150 0.0 0.0 308396 12012 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 7 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76152 0.0 0.0 308396 12012 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 6 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt pwillard 76154 0.0 0.0 308396 12012 pts/3 Sl+ 08:19 0:00 /opt/sourcetrail/sourcetrail_indexer 12 5672b892-0bca-40bd-ad43-699031b299b0 /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-06_05-32-45.txt

pwillard2 commented 5 years ago

Note here that there is the possibility that sourcetrail was still indexing the project and that somehow the compilations of the 5 projects and their indexing would have taken a very long time. But looking at those, it seems that the indexing should have been completed before 7 am.

relevant log files: -rw-r--r-- 1 pwillard domain users 56083503 Sep 5 21:32 log_2019-09-05_20-03-15.txt -rw-r--r-- 1 pwillard domain users 95575280 Sep 6 00:57 log_2019-09-05_22-14-01.txt -rw-r--r-- 1 pwillard domain users 34711057 Sep 6 02:42 log_2019-09-06_01-39-18.txt -rw-r--r-- 1 pwillard domain users 44221090 Sep 6 04:46 log_2019-09-06_03-28-21.txt -rw-r--r-- 1 pwillard domain users 101363050 Sep 6 08:20 log_2019-09-06_05-32-45.txt

So about 2 hours between each log.

mlangkabel commented 5 years ago

Hi @pwillard2, to me this is looking fine. Sourcetrail is making continuous progress on those projects during the entire night and the time it took to index each project approximately scales with the size of the generated log file. So it looks like Sourcetrail really was busy and not idle.

Is it the timestamp of the last modification that is shown before the log file names you posted? If yes, then what seems odd to me is that there is kind of a break between Sourcetrail is done with one project and starts with the next (e.g. done at Sep 5 21:32 and starting with the next at Sep 5 22:14). Is your script doing something else inbetween?

Can you try to open the 5th project within Sourcetrail? While indexing Sourcetrail stores the data into a temporary database (which allows to continue browsing the old state while Sourcetrail is working). If you canceled the indexing, the temporary database may not have been taken over and Sourcetrail may ask whether or not you want to take over the state of the canceled indexing. Make sure that you take it, otherwise your progress from the night will be gone.

If you opened it, you may try to do a refresh. In the indexing dialog you can switch between a full index and indexing just the updated/incomplete files. If you choose the latter option it will only index the files that it missed when you canceled it and you can estimate how much more time it would have taken.

egraether commented 5 years ago

We talked about your issues internally today. We think they are likely caused by low RAM. Do you run multiple Sourcetrail instances in parallel? Do you have any other processes running that consume a lot of RAM?

pwillard2 commented 5 years ago

I don't run multiple sourcetrail in parallel. Basically, for a few projects, I do a full clean, then do a full build generating the json cdb file, and then run sourcetrail full indexing. Just one after another.

And I don't think it's a memory issue. I have 32 GB RAM. And I have Swap if needed. If you look at the next comment, here it is kind of 'stuck', and I have plenty of free RAM (20 GB).

pwillard2 commented 5 years ago

So on Friday I started Sourcetrail on the SAME project as the original one I reported at the beginning of this issue. Again, full clean, full recompile and then sourcetrail. No other Sourcetrail command. Just one project. Sourcetrail indexing again got stuck. Note that I did NOT reboot my linux PC ever.

But the interesting thing is that this morning I open the project with the GUI, and asked to Start Indexing 'Incomplete & Updated files', and it is EXTREMELY SLOW. I thought it was STUCK, but it went for 33 files to 55 files (out of 6132) in 25 minutes. And it's still at 0%. It does maybe 1 file per few minutes.... But sometimes the files count does not change...

log_2019-09-09_08-29-32.txt.gz

I see Sourcetrail processes, but they don't do much...

pwillard 37668 98.8 13.6 6260300 4412780 ? SLl 08:29 42:33 /opt/sourcetrail/Sourcetrail pwillard 75904 0.0 0.0 306796 12088 ? Sl 09:12 0:00 /opt/sourcetrail/sourcetrail_indexer 5 5a764b39-484c-423c-98f6-ed6cae968b5b /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-09_08-29-32.txt pwillard 75906 0.0 0.0 306796 12088 ? Sl 09:12 0:00 /opt/sourcetrail/sourcetrail_indexer 12 5a764b39-484c-423c-98f6-ed6cae968b5b /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-09_08-29-32.txt pwillard 75908 0.0 0.0 306796 12084 ? Sl 09:12 0:00 /opt/sourcetrail/sourcetrail_indexer 2 5a764b39-484c-423c-98f6-ed6cae968b5b /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-09_08-29-32.txt pwillard 75916 0.0 0.0 306796 12084 ? Sl 09:12 0:00 /opt/sourcetrail/sourcetrail_indexer 10 5a764b39-484c-423c-98f6-ed6cae968b5b /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-09_08-29-32.txt pwillard 75917 0.0 0.0 125248 388 ? R 09:12 0:00 /opt/sourcetrail/sourcetrail_indexer 6 5a764b39-484c-423c-98f6-ed6cae968b5b /opt/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/ /home/corp.roku/pwillard/.config/sourcetrail/log/log_2019-09-09_08-29-32.txt

Is there anything I need to clean / delete ?

pwillard2 commented 5 years ago

I think it's really stuck now. It has been 58 files (58/6132 files) for almost 1 hour.

pwillard2 commented 5 years ago

There are no more indexing processes, the log file shows this:

09:32:03 | 140332821903104 | TaskBuildIndex.cpp:205 runIndexerProcess() | INFO: Indexer process 10 returned with 9 09:32:03 | 140333853681408 | TaskBuildIndex.cpp:205 runIndexerProcess() | INFO: Indexer process 8 returned with 9 09:32:03 | 140334969386752 | TaskBuildIndex.cpp:205 runIndexerProcess() | INFO: Indexer process 1 returned with 9 09:32:03 | 140334950475520 | TaskBuildIndex.cpp:205 runIndexerProcess() | INFO: Indexer process 2 returned with 9 09:32:03 | 140333845288704 | TaskBuildIndex.cpp:205 runIndexerProcess() | INFO: Indexer process 9 returned with 9 09:32:03 | 140334942082816 | TaskBuildIndex.cpp:205 runIndexerProcess() | INFO: Indexer process 3 returned with 9 09:32:03 | 140333878859520 | TaskBuildIndex.cpp:205 runIndexerProcess() | INFO: Indexer process 5 returned with 9 09:32:03 | 140334933690112 | SqliteStorage.cpp:199 executeStatementScalar() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 09:32:03 | 140334933690112 | SqliteStorage.cpp:168 executeStatement() | ERROR: 1: SQLITE_ERROR[1]: cannot start a transaction within a transaction 09:32:03 | 140334916904704 | TaskBuildIndex.cpp:107 doUpdate() | INFO: interrupted indexing. 09:32:03 | 140334933690112 | SqliteStorage.cpp:182 executeStatement() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 09:32:03 | 140334933690112 | SqliteStorage.cpp:182 executeStatement() | ERROR: 19: SQLITE_CONSTRAINT[19]: FOREIGN KEY constraint failed 09:32:03 | 140334933690112 | SqliteStorage.cpp:182 executeStatement() | ERROR: 19: SQLITE_CONSTRAINT[19]: FOREIGN KEY constraint failed 09:32:03 | 140334933690112 | SqliteStorage.cpp:182 executeStatement() | ERROR: 19: SQLITE_CONSTRAINT[19]: FOREIGN KEY constraint failed 09:32:03 | 140334933690112 | SqliteStorage.cpp:182 executeStatement() | ERROR: 19: SQLITE_CONSTRAINT[19]: FOREIGN KEY constraint failed 09:32:03 | 140334933690112 | SqliteStorage.cpp:182 executeStatement() | ERROR: 19: SQLITE_CONSTRAINT[19]: FOREIGN KEY constraint failed 09:32:03 | 140334933690112 | SqliteStorage.cpp:182 executeStatement() | ERROR: 19: SQLITE_CONSTRAINT[19]: FOREIGN KEY constraint failed 09:32:03 | 140334933690112 | SqliteStorage.cpp:239 executeQuery() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 09:32:03 | 140334933690112 | TaskRunner.cpp:53 update() | ERROR: Unknown exception thrown during task running 09:43:14 | 140335221036800 | INFO: send MessageWindowFocus 09:59:36 | 140335221036800 | INFO: send MessageWindowFocus 09:59:41 | 140335221036800 | INFO: send MessageWindowFocus

And it seems there was corruption at the beginning: grep -n SQLITE_CORRUPT log_2019-09-09_08-29-32.txt 153:08:42:09 | 140335237822208 | SqliteStorage.cpp:168 executeStatement() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 156:08:42:18 | 140335237822208 | SqliteStorage.cpp:168 executeStatement() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 161:08:42:20 | 140335237822208 | SqliteStorage.cpp:168 executeStatement() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 162:08:42:20 | 140335237822208 | SqliteStorage.cpp:168 executeStatement() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 171:08:42:30 | 140335237822208 | SqliteStorage.cpp:168 executeStatement() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 174:08:42:30 | 140335237822208 | SqliteStorage.cpp:168 executeStatement() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 183:08:42:35 | 140335237822208 | SqliteDatabaseIndex.cpp:42 removeFromDatabase() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 185:08:42:35 | 140335237822208 | SqliteDatabaseIndex.cpp:42 removeFromDatabase() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 189:08:42:35 | 140335237822208 | SqliteDatabaseIndex.cpp:42 removeFromDatabase() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 191:08:42:35 | 140335237822208 | SqliteDatabaseIndex.cpp:42 removeFromDatabase() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 282:08:42:41 | 140334933690112 | SqliteStorage.cpp:199 executeStatementScalar() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 144721:09:32:03 | 140334933690112 | SqliteStorage.cpp:239 executeQuery() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 144736:09:32:03 | 140334933690112 | SqliteStorage.cpp:199 executeStatementScalar() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 144739:09:32:03 | 140334933690112 | SqliteStorage.cpp:182 executeStatement() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed 144746:09:32:03 | 140334933690112 | SqliteStorage.cpp:239 executeQuery() | ERROR: 11: SQLITE_CORRUPT[11]: database disk image is malformed

How to I clean/delete the database disk image ?? what is that?

egraether commented 5 years ago

Thanks for the info! I'm out of office today, but I will look into it tomorrow.

pwillard2 commented 5 years ago

What do I have to do to resolve the 'database disk image is malformed' without rebooting? What to delete? Btw, since I am using --full to reindex everything, sourcetrail should start from an empty database ? Thanks.

egraether commented 5 years ago

Delete the '.srctrldb' file. Yes, with '--full' a new one is created that replaces the old at the end of indexing.

pwillard2 commented 5 years ago

I restarted the full indexing of this project last night. This time I did not get any 'SQLITE_CORRUPT[11]: database disk image is malformed' but the indexing took almost 14 hours ! The overview is saying: 13642 files (13341 complete) 6320850 lines of code 3549310 symbols 15549290 references 4208 errors (0 fatal)

My other usual projects (that index in 1 or 2 hours) have this: 10309 files (10219 complete) 4706293 lines of code 704539 symbols 3121529 references I will have to review my project and maybe I am indexing too much stuff compare to the usual projects.

pwillard2 commented 5 years ago

I reduced the number of files to be indexed. But it still took more than 4 hours to index. last indexed: 2019-09-13 22:00:20 8588 files (8515 complete) 2690064 lines of code 506436 symbols 2309177 references 668 errors (0 fatal) log_2019-09-13_17-42-08.txt.gz

egraether commented 5 years ago

The log file shows the same pattern of indexer processes constantly quitting right after getting started as above.

With the new 2019.3 release we support more compilation targets in our c++ indexer. For that reason more LLVM/Clang libraries are loaded in each indexer process. That increases memory demand.

Can you track system wide memory usage during indexing somehow?