ArnaoutLab / diversity

Partitioned frequency- and similarity-sensitive diversity in Python
MIT License
6 stars 1 forks source link

Attempts to allocate full memmap to memory #89

Closed IosiaLectus closed 5 months ago

IosiaLectus commented 7 months ago

When I pass a memmap as the similarity matrix, it appears the full memmap gets pulled into memory.

Traceback (most recent call last): File "/home/jcouch/code/moore_toolkit.py", line 1650, in main() File "/home/jcouch/code/moore_toolkit.py", line 1639, in main Do_Madani( do_subset_selection=False, do_diversity=True, do_accs=True, do_tree=False) File "/home/jcouch/code/moore_toolkit.py", line 1527, in Do_Madani Do_stuff(train_images, train_labels, test_images, test_labels, val_images, val_labels, subsets_df, DATA_PATH, do_diversity=do_diversity, do_accs=do_accs, do_tree=do_tree, start_time=start_time) File "/home/jcouch/code/moore_toolkit.py", line 1024, in Do_stuff div_df = compute_diversities_sim_on_fly( File "/home/jcouch/code/moore_toolkit.py", line 499, in compute_diversities_sim_on_fly row.update(get_subset_divs(sim, sub_comms, indices_by_class, qlist, sim_name)) File "/home/jcouch/code/moore_toolkit.py", line 410, in get_subset_divs metacommunity = Metacommunity(counts, sim_matrix) File "/home/jcouch/code/environments/moore_env/lib/python3.10/site-packages/greylock/metacommunity.py", line 83, in init self.components = make_components( File "/home/jcouch/code/environments/moore_env/lib/python3.10/site-packages/greylock/components.py", line 110, in make_components return SimilaritySensitiveComponents(abundance=abundance, similarity=similarity) File "/home/jcouch/code/environments/moore_env/lib/python3.10/site-packages/greylock/components.py", line 65, in init all_similarity = self.similarity.weighted_similarities( File "/home/jcouch/code/environments/moore_env/lib/python3.10/site-packages/greylock/similarity.py", line 87, in weighted_similarities return self.similarity @ relative_abundance numpy.core._exceptions._ArrayMemoryError: Unable to allocate 230. GiB for an array with shape (175587, 175587) and data type float64

chhotii-alex commented 7 months ago

Also, it was noted that the orginal datatype was float32.

The numpy.memmap documentation does have the caveat "This subclass of ndarray has some unpleasant interactions with some operations..." I suppose we need to re-implement matrix multiplication for pre-multiplication by a memmap.

chhotii-alex commented 5 months ago

@IosiaLectus what is the memory layout of your memmap? Is it row-major, C-style or column-major, Fortran-style?

chhotii-alex commented 5 months ago

I have tried mightily to replicate this issue and have not.

Using the same machine as where this error was observed, but not the same venv.

Both row-major and column-major memmaps of dimension (180000, 180000) work (probably with the expected performance difference).

I tried various versions of numpy, no difference.

I observed that as the memmap'd similarity matrix is being filled in, if one does not flush() occasionally, one will get a bus error, but that's not what's going on here.

I'm going to close this as "not reproducible". I don't know what exotic set of circumstances went into this stack trace; if we can find that, we can re-open.

chhotii-alex commented 5 months ago

Can't replicate