Closed IosiaLectus closed 5 months ago
Also, it was noted that the orginal datatype was float32.
The numpy.memmap documentation does have the caveat "This subclass of ndarray has some unpleasant interactions with some operations..." I suppose we need to re-implement matrix multiplication for pre-multiplication by a memmap.
@IosiaLectus what is the memory layout of your memmap? Is it row-major, C-style or column-major, Fortran-style?
I have tried mightily to replicate this issue and have not.
Using the same machine as where this error was observed, but not the same venv
.
Both row-major and column-major memmaps of dimension (180000, 180000) work (probably with the expected performance difference).
I tried various versions of numpy
, no difference.
I observed that as the memmap'd similarity matrix is being filled in, if one does not flush()
occasionally, one will get a bus error, but that's not what's going on here.
I'm going to close this as "not reproducible". I don't know what exotic set of circumstances went into this stack trace; if we can find that, we can re-open.
Can't replicate
When I pass a memmap as the similarity matrix, it appears the full memmap gets pulled into memory.
Traceback (most recent call last): File "/home/jcouch/code/moore_toolkit.py", line 1650, in
main()
File "/home/jcouch/code/moore_toolkit.py", line 1639, in main
Do_Madani( do_subset_selection=False, do_diversity=True, do_accs=True, do_tree=False)
File "/home/jcouch/code/moore_toolkit.py", line 1527, in Do_Madani
Do_stuff(train_images, train_labels, test_images, test_labels, val_images, val_labels, subsets_df, DATA_PATH, do_diversity=do_diversity, do_accs=do_accs, do_tree=do_tree, start_time=start_time)
File "/home/jcouch/code/moore_toolkit.py", line 1024, in Do_stuff
div_df = compute_diversities_sim_on_fly(
File "/home/jcouch/code/moore_toolkit.py", line 499, in compute_diversities_sim_on_fly
row.update(get_subset_divs(sim, sub_comms, indices_by_class, qlist, sim_name))
File "/home/jcouch/code/moore_toolkit.py", line 410, in get_subset_divs
metacommunity = Metacommunity(counts, sim_matrix)
File "/home/jcouch/code/environments/moore_env/lib/python3.10/site-packages/greylock/metacommunity.py", line 83, in init
self.components = make_components(
File "/home/jcouch/code/environments/moore_env/lib/python3.10/site-packages/greylock/components.py", line 110, in make_components
return SimilaritySensitiveComponents(abundance=abundance, similarity=similarity)
File "/home/jcouch/code/environments/moore_env/lib/python3.10/site-packages/greylock/components.py", line 65, in init
all_similarity = self.similarity.weighted_similarities(
File "/home/jcouch/code/environments/moore_env/lib/python3.10/site-packages/greylock/similarity.py", line 87, in weighted_similarities
return self.similarity @ relative_abundance
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 230. GiB for an array with shape (175587, 175587) and data type float64