bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
134 stars 11 forks source link

Data load error #53

Closed abhiachoudhary closed 8 months ago

abhiachoudhary commented 8 months ago

I just starting using BPCells and was following along the tutorial. But in the RNA matrix conversion section, the command with open_matrix_10x_hdf5 gets stuck (doesn't finish for hours): mat_raw <- open_matrix_10x_hdf5("pbmc_3k_10x.h5", feature_type="Gene Expression") %>% write_matrix_dir("pbmc_3k_rna_raw")

It shows no error, progress, nor it stops running. Is there a bug in open_matrix_10x_hdf5, or I'm doing something wrong?

bnprks commented 8 months ago

This looks likely to be a bug in BPCells, potentially related to issue #52. It looks like I'm getting a similar issue when I test this out right now too -- I see continuous 100% CPU usage on one core, whereas the import should finish within seconds if things were working properly. So probably my fault, not yours -- maybe I changed something recently that inadvertently broke the 10x matrix import. I'll do some closer looking and hopefully post back here soon with a fix.

bnprks commented 8 months ago

I think I likely found + fixed the cause with a one-line bugfix 🤦 . This should be fixed in 21f8dcf641fd, so if you re-install from the latest github things should be good.

It looks like I introduced this bug on October 18th in 3711a401, and the effect is to re-read the entire list of cell IDs / gene IDs for each time we want to read one ID (in this case resulting in about 100,000x too much work).

Thanks so much for reporting this bug and sorry for the inconvenience to you and the other ~400 people who have installed a broken version of BPCells in those two weeks. Let me know if you continue to have issues after this attempted fix

abhiachoudhary commented 8 months ago

That fixed it. Thank you!