dpeerlab / Palantir

Single cell trajectory detection
https://palantir.readthedocs.io
GNU General Public License v2.0
226 stars 52 forks source link

Multithreading issue with palantir.utils.run_magic_imputation() #133

Open gighuarhguggg45 opened 9 months ago

gighuarhguggg45 commented 9 months ago

Hi,

Thank you for your work with Palantir. I have been running into issues with imputed_X = palantir.utils.run_magic_imputation(ad,n_jobs=16)

(or any n_jobs > 1).

I get the following warning shortly after the run starts, which always ends up with the python kernel dying after a while.

`RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock

I do not get any errors with: imputed_X = palantir.utils.run_magic_imputation(ad,n_jobs=1)

But n_jobs=1 just runs forever and never produces a result with my large scATAC anndata (101,966 cells and 228,892 features, with 30,000 variable features).

Of note, when I run the tutorial Palantir analysis notebook, I still get the warning, but I do not get the crash.

I am using 80 vCPUs and 640GB of memory. I'm using python 3.9.2

Thank you.

katosh commented 9 months ago

Hi @gighuarhguggg45! Thank you for reporting. This sounds like your ad.X might be a JAX array. Could you try

ad.X = np.asarray(ad.X)
imputed_X = palantir.utils.run_magic_imputation(ad, n_jobs=16)

and see if this fixes the problem? Otherwise, I would need some example data to reproduce this.

rishikanthc commented 9 months ago

I get the same error when I use pytorch dataloader with num_workers > 0. Looks like pytorch dataloader uses os.fork() which is conflicting with jax.

katosh commented 9 months ago

Just to clarify: We do not explicitly use jax in the palantir.utils.run_magic_imputation function. However, we do use the dot product function that picks an implementation based on the input arrays. Jax is only being used if one of the input arrays (ad.X or ad.obsp["DM_Similarity"]) is a jax array.

Please let me know if the solution suggested above fixes the problem.

wbrett87 commented 9 months ago

I get the same problem, and the solution you suggested does not work.

image

katosh commented 9 months ago

@wbrett87 can you please inspect adata.X and see if the shape, format and content is what we expect from a gene expression matrix?

wbrett87 commented 9 months ago

image

katosh commented 9 months ago

@wbrett87, it appears that your ad.X is stored in a sparse format. Consequently, converting ad.X directly to a NumPy array is unnecessary and likely the cause of the error you're experienced above. That said, it's unexpected that you're encountering a JAX-related error upon executing palantir.utils.run_magic_imputation(ad, n_jobs=16) directly. This issue isn't something I've been able to replicate on my end. Another possibility could be that ad.obsp["DM_Similarity"] might be in a JAX array format, although this scenario would be unusual. For a more targeted investigation, could you provide a simplified code snippet or dataset that replicates the error? This would greatly aid in diagnosing and resolving the problem more efficiently.

wbrett87 commented 9 months ago

ad.obsp["DM_Similarity"] is not a JAX array. I ran with one job as a temporary workaround. I will provide a simplified code snippet at some point soon when I have a bit more time. Thanks for your attention to this!