PatrickKinnear / skein-dimensions

A project to calculate the dimensions of skein modules of mapping tori of T^2
MIT License
0 stars 0 forks source link

Handle OOM errors #4

Closed PatrickKinnear closed 2 years ago

PatrickKinnear commented 2 years ago

Investigate and fix the memory errors the program is experiencing in generating random SL_2(Z) matrices.

PatrickKinnear commented 2 years ago

On generating the random SL_2(Z) matrices (this happens in the function generate_raw_data, when generate mode is invoked), the program does fine for a while but is eventually killed by the OOM (out of memory) killer.

Have tried modularising [turn the subroutine within a for loop into a function which is called] as much as possible so that function cleanup routines manage memory. I noticed one for loop that had not been modularised, which was the loop in get_dim_estimate_empty. This has now been modularised into compute_corank. This has increased the length of time the program runs for but it is still killed by the oom killer.

Tracing memory usage with the tracemalloc module, it seems that the program is always killed at the line where the relations returned by get_relations_empty are turned into a Sage matrix:

A = matrix(QQ['q'].fraction_field(), relations)

It is not clear why this line causes an oom issue. There is no clear link to the number of relations: sometimes the process is killed when there are ~50 relations even though it has previously done computations for ~100 relations. Moreover, at the point when the killing happens, tracemalloc indicates that the current memory usage is not problematically high: it is normally below peak usage and is below usages which have been acceptable before.

This suggests that the problem is not build-up of redundant data in the program, but that the above displayed line of code is itself demanding, in some circumstances, too much memory. It is not clear why.

One option may be the sparse matrix implementation Issue #3 .

PatrickKinnear commented 2 years ago

Have played around and can't think of anything. Have posted to AskSageMath.

Have produced some examples of where the code breaks and doesn't. In each resultsx file, the last set of relations causes the program to be killed and the others do not. The breaking_matrix file also contains an example of a matrix causing an oom kill, from the sequence [4, 3, 6, 9].

breaking_matrix.txt relations.txt relations2.txt relations3.txt relations4.txt relations5.txt

PatrickKinnear commented 2 years ago

Typical oom killer logs:

Jan 13 17:34:05 maths-pgr-220 kernel: [20093.866914] python3 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 Jan 13 17:34:05 maths-pgr-220 kernel: [20093.866947] oom_kill_process.cold+0xb/0x10 Jan 13 17:34:05 maths-pgr-220 kernel: [20093.867189] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jan 13 17:34:05 maths-pgr-220 kernel: [20093.867799] oom-kill:constraint=CONSTRAINT_NONE,nodemask(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service,task=python3,pid=37385,uid=1000 Jan 13 17:34:05 maths-pgr-220 kernel: [20093.867812] Out of memory: Killed process 37385 (python3) total-vm:15446996kB, anon-rss:10986532kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:21936kB oom_score_adj:0 Jan 13 17:34:05 maths-pgr-220 kernel: [20094.356796] oom_reaper: reaped process 37385 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

PatrickKinnear commented 2 years ago

Implementing sparse matrices and polynomials seems to be the best improvement we can get so far, and at shell 6 we can handle matrices associated to sequences of length 5 with entries up to 10.

Has also been suggested to use immutable data types. This might be as good as this gets but probably allows us to get going for now.