giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
849 stars 174 forks source link

[BUG] read-only error when computing persistence image or heat kernel with input > 1MB and n_jobs > 1 #427

Closed NickSale closed 4 years ago

NickSale commented 4 years ago

Describe the bug

"ValueError: assignment destination is read-only" error thrown when calling transform (or fit_transform) on PersistenceImage or HeatKernel with n_jobs > 1 and diagrams sufficiently large

To reproduce

Gist HeatKernel: https://gist.github.com/NickSale/30d9ae30b2d99b3d62700a4a92907422 Gist PersistenceImage: https://gist.github.com/NickSale/403ba2218de93fc9dbefb77e1ab9d0f3

Expected behavior

No error should be thrown.

Actual behaviour

HeatKernel Gist: https://gist.github.com/NickSale/9abdc2919e2c0cb941dec1af5228d5a6 PersistenceImage Gist: https://gist.github.com/NickSale/3d00e0e765b849ffb84592815158fe5a

Versions

Windows-10-10.0.18362-SP0 Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] NumPy 1.18.5 SciPy 1.4.1 Joblib 0.13.2 Scikit-learn 0.22.2.post1 Giotto-tda 0.2.1

Additional context

ulupo commented 4 years ago

Thanks @NickSale! I'm investigating whether this is a bug of giotto-tda or of joblib. Are you able to reproduce the issue with "nontrivial" input data? In your examples, the input persistence diagrams are all trivial (birth=death). When I make a non-trivial example (e.g. adding 1 in the second column) I do not see the issue.

ulupo commented 4 years ago

@NickSale actually, let me correct that: when I change the line

X = np.linspace(0, 100, 300000)

for

X = np.linspace(0, 100)

or

X = np.random.random(100)

then things work out (regardless of triviality).

So indeed data size seems to be the issue here.

NickSale commented 4 years ago

Hi @ulupo - I originally ran across this bug when working with a large collection of non-trivial diagrams coming from a project I'm working on. I managed to figure out what was happening and you might have seen that I've just submitted a pull request which should fix it, but it seems to be that joblib is working as expected, just that its automatic memory mapping functionality hadn't been accounted for in the giotto code.

ulupo commented 4 years ago

Hi, yes I'm looking at #428 now, thank you very much for it! At first glance it looks like your diagnosis and fix are on point.