Hello!
I am running the indexing process inside the Docker-container on a server with 30 Gb RAM + 8 Gb swap and 8 CPUs.
In first 15 minutes of indexing RAM usage grew up to all RAM (29.3 of 29.5 Gb) and to 4 Gb of swap.
I think it would be better to mention it somewhere in prerequests.
Also the process took only 1 CPU of 8. Is it possible to use more?
As I understand, the indexing process generates 2 files (9.2 Gb):
-rw-r--r-- 1 root 502 5491768320 Jun 19 17:41 serializedGRAPH
-rw-r--r-- 1 root 502 4197127944 Jun 19 17:41 serializedGRAPH_preGapPathIndex
Does the program need all files from source PRG_MHC_GRCh38_withIMGT.tar.gz or only new 2 files?
All process took 2 hours and 20 minutes.
I made a second run and got a bit different files:
-rw-r--r-- 1 root 502 5491768268 Jun 20 07:17 serializedGRAPH
-rw-r--r-- 1 root 502 4197127912 Jun 20 07:17 serializedGRAPH_preGapPathIndex
How do I have to understand, that indexing is done correctly?
No, indexing is single-threaded -- but you only have to do it once.
All files
I don't think this process is necessarily deterministic in terms of file size. The test run on the NA12878 CRAM will tell you whether everything worked.
Hello! I am running the indexing process inside the Docker-container on a server with 30 Gb RAM + 8 Gb swap and 8 CPUs. In first 15 minutes of indexing RAM usage grew up to all RAM (29.3 of 29.5 Gb) and to 4 Gb of swap.
Does the program need all files from source PRG_MHC_GRCh38_withIMGT.tar.gz or only new 2 files?
All process took 2 hours and 20 minutes.
How do I have to understand, that indexing is done correctly?