broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
271 stars 50 forks source link

TypeError: cannot pickle 'weakref.ReferenceType' object #296

Open lili03080317 opened 8 months ago

lili03080317 commented 8 months ago

Hi,when I run example dataset, I got the error. cellbender remove-background --cuda --input heart10k_raw_feature_bc_matrix.h5 --output cellbender_test_outfile.h5

cellbender:remove-background: Command: cellbender remove-background --cuda --input /kaggle/input/cellbender-test/heart10k_raw_feature_bc_matrix.h5 --output /kaggle/working/cellbender_test_outfile.h5 cellbender:remove-background: CellBender 0.3.0 cellbender:remove-background: (Workflow hash 58efcc97e5) cellbender:remove-background: 2023-10-16 12:53:55 cellbender:remove-background: Running remove-background cellbender:remove-background: Loading data from /kaggle/input/cellbender-test/heart10k_raw_feature_bc_matrix.h5 cellbender:remove-background: CellRanger v3 format cellbender:remove-background: Features in dataset: 31053 Gene Expression cellbender:remove-background: Trimming features for inference. cellbender:remove-background: 22826 features have nonzero counts. cellbender:remove-background: Prior on counts for cells is 7470 cellbender:remove-background: Prior on counts for empty droplets is 89 cellbender:remove-background: Excluding 6459 features that are estimated to have <= 0.1 background counts in cells. cellbender:remove-background: Including 16367 features in the analysis. cellbender:remove-background: Trimming barcodes for inference. cellbender:remove-background: Excluding barcodes with counts below 44 cellbender:remove-background: Using 3560 probable cell barcodes, plus an additional 15959 barcodes, and 64121 empty droplets. cellbender:remove-background: Largest surely-empty droplet has 126 UMI counts. cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /tmp/tmptiofsoe4 cellbender:remove-background: No saved checkpoint. cellbender:remove-background: No checkpoint loaded. cellbender:remove-background: Running inference... cellbender:remove-background: [epoch 001] average training loss: 8870.0828 cellbender:remove-background: [epoch 002] average training loss: 6921.8352 (9.5 seconds per epoch) cellbender:remove-background: Will checkpoint every 45 epochs cellbender:remove-background: [epoch 003] average training loss: 5195.8968 cellbender:remove-background: [epoch 004] average training loss: 4018.3106 cellbender:remove-background: [epoch 005] average training loss: 3702.8906 cellbender:remove-background: [epoch 005] average test loss: 3634.8763 cellbender:remove-background: [epoch 006] average training loss: 3611.8797 cellbender:remove-background: [epoch 007] average training loss: 3549.3487 cellbender:remove-background: [epoch 008] average training loss: 3474.6104 cellbender:remove-background: [epoch 009] average training loss: 3415.4470 cellbender:remove-background: [epoch 010] average training loss: 3352.4311 cellbender:remove-background: [epoch 010] average test loss: 3223.6895 cellbender:remove-background: [epoch 011] average training loss: 3340.8036 cellbender:remove-background: [epoch 012] average training loss: 3276.2046 cellbender:remove-background: [epoch 013] average training loss: 3172.3478 cellbender:remove-background: [epoch 014] average training loss: 3121.9731 cellbender:remove-background: [epoch 015] average training loss: 3075.7645 cellbender:remove-background: [epoch 015] average test loss: 3036.6159 cellbender:remove-background: [epoch 016] average training loss: 3068.6666 cellbender:remove-background: [epoch 017] average training loss: 3054.3426 cellbender:remove-background: [epoch 018] average training loss: 3047.4215 cellbender:remove-background: [epoch 019] average training loss: 3018.0458 cellbender:remove-background: [epoch 020] average training loss: 2988.4565 cellbender:remove-background: [epoch 020] average test loss: 2934.7579 cellbender:remove-background: [epoch 021] average training loss: 2960.7718 cellbender:remove-background: [epoch 022] average training loss: 2939.6192 cellbender:remove-background: [epoch 023] average training loss: 2946.4738 cellbender:remove-background: [epoch 024] average training loss: 2928.8948 cellbender:remove-background: [epoch 025] average training loss: 2924.7158 cellbender:remove-background: [epoch 025] average test loss: 2887.6811 cellbender:remove-background: [epoch 026] average training loss: 2921.1435 cellbender:remove-background: [epoch 027] average training loss: 2904.9143 cellbender:remove-background: [epoch 028] average training loss: 2900.2679 cellbender:remove-background: [epoch 029] average training loss: 2901.3344 cellbender:remove-background: [epoch 030] average training loss: 2897.8041 cellbender:remove-background: [epoch 030] average test loss: 2840.5205 cellbender:remove-background: [epoch 031] average training loss: 2900.0525 cellbender:remove-background: [epoch 032] average training loss: 2887.5236 cellbender:remove-background: [epoch 033] average training loss: 2894.5167 cellbender:remove-background: [epoch 034] average training loss: 2888.5799 cellbender:remove-background: [epoch 035] average training loss: 2888.4892 cellbender:remove-background: [epoch 035] average test loss: 2862.9823 cellbender:remove-background: [epoch 036] average training loss: 2893.6630 cellbender:remove-background: [epoch 037] average training loss: 2892.6482 cellbender:remove-background: [epoch 038] average training loss: 2906.2663 cellbender:remove-background: [epoch 039] average training loss: 2893.5604 cellbender:remove-background: [epoch 040] average training loss: 2881.6000 cellbender:remove-background: [epoch 040] average test loss: 2844.5246 cellbender:remove-background: [epoch 041] average training loss: 2884.3972 cellbender:remove-background: [epoch 042] average training loss: 2886.4992 cellbender:remove-background: [epoch 043] average training loss: 2882.9892 cellbender:remove-background: [epoch 044] average training loss: 2881.1603 cellbender:remove-background: [epoch 045] average training loss: 2864.0779 cellbender:remove-background: [epoch 045] average test loss: 2860.9506 cellbender:remove-background: Saving a checkpoint... cellbender:remove-background: Could not save checkpoint cellbender:remove-background: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/cellbender/remove_background/checkpoint.py", line 115, in save_checkpoint torch.save(model_obj, filebase + '_model.torch') File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 441, in save _save(obj, opened_zipfile, pickle_module, pickle_protocol) File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 653, in _save pickler.dump(obj) TypeError: cannot pickle 'weakref.ReferenceType' object

How to fix it? Thank you!

bazelep commented 8 months ago

I had this issue when using python 3.11 in a conda environment. Using python 3.7 and pytorch 1.13.1 (conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia), it works fine.

lili03080317 commented 8 months ago

I had this issue when using python 3.11 in a conda environment. Using python 3.7 and pytorch 1.13.1 (conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia), it works fine.

Thank you very much!

ssuthram-gilead commented 8 months ago

Thanks. I tried the above. However, it still doesn't work. For some reason, it tries to save thecheckpoint file under python 3.9 even though I have a conda environment with python 3.7

yyfu01 commented 8 months ago

I had this issue when using python 3.11 in a conda environment. Using python 3.7 and pytorch 1.13.1 (conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia), it works fine.

Thank you very much!

Hey lili, does it work for you?

yyfu01 commented 8 months ago

I encountered the same issue,and still doesn't work yet...

lili03080317 commented 8 months ago

I had this issue when using python 3.11 in a conda environment. Using python 3.7 and pytorch 1.13.1 (conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia), it works fine.

Thank you very much!

Hey lili, does it work for you?

Hi,it does.

shahrozeabbas commented 6 months ago

I had this issue when using python 3.11 in a conda environment. Using python 3.7 and pytorch 1.13.1 (conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia), it works fine.

Thank you very much!

@bazelep Hey! Currently having this issue...do you know if this might work with CUDA 11.4 or 11.8? I'm working on a server and we seem to skip 11.6 :(

bazelep commented 6 months ago

The CUDA on my server is 11.4, so it should work. Just make sure your environment isn't using another python installation.

ergonyc commented 5 months ago

Is anyone working on updating to modern python support? I might point out that python 3.7 was officially sunsetted months ago (05 Jun 2023).

JeffreyMaurer commented 1 month ago

I'm using python 3.11.7. Pytorch version '2.3.0+cu121'. Keras version '3.0.3.dev2024011803'. Nvidia GPU.

TO work around this issue, here are my changes to get around this issue:

cellbender/remove_background/checkpoint.py

115>>>torch.save(model_obj, filebase + '_model.torch') 115<<<torch.save(model_obj.state_dict(), filebase + '_model.torch') 116>>>torch.save(scheduler, filebase + '_model.torch') 116<<<scheduler.save(filebase + '_optim.torch')

fm361 commented 1 month ago
115>>>torch.save(model_obj, filebase + '_model.torch')
115<<<torch.save(model_obj.state_dict(), filebase + '_model.torch')
116>>>torch.save(scheduler, filebase + '_model.torch')
116<<<scheduler.save(filebase + '_optim.torch')

This workaround is still needed