Tranning bug "ZeroDivisionError: division by zero"

NiklausZZZ commented 1 month ago

Hi, Thank you for the useful tool! When I used the test data, I encountered a bug:

python -u train.py --train_dir ./train/ --val_dir ./data/val/ --output ./output_dir/ --lr 0.0005 --max_epoch 20 --batch_size 128

och 20 --batch_size 128
0
{'domain_1': <torch.utils.data.dataloader.DataLoader object at 0x7fdf9300f4f0>}
python3.9/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
  warnings.warn("Initializing zero-element tensors is a no-op")
Traceback (most recent call last):
  File "/train.py", line 53, in <module>
    acc = opt_utils.accuracy(algorithm,val_loaders[loader_idx])
  File "/utils/opt_utils.py", line 37, in accuracy
    return correct / total
ZeroDivisionError: division by zero

Any suggestions would be greatly appreciated！

Patchouli-M commented 1 month ago

Thank you for your attention. Would you like to predict data or train a model? Could you please let me know what format your data is in?

NiklausZZZ commented 1 month ago

I used the test data provided by the SequencingCancerFinder. When I deleted the parameters --lr 0.0005 --max_epoch 20 --batch_size 128, it worked:

python -u train.py --train_dir ./train/ --val_dir ./data/val/ --output ./output_dir/

But when I tested the model, all cells was 1, suggesting that all cells was tumor cells?

$ python -u infer.py --ckp=./output_dir/model_epoch2.pkl --matrix=./data/val/domain_val.csv --out=test.out.csv

1 21
begin 0
     sample  predict
0    cell_1      1.0
1    cell_2      1.0
2    cell_3      1.0
3    cell_4      1.0
4    cell_5      1.0
5    cell_6      1.0
6    cell_7      1.0
7    cell_8      1.0
8    cell_9      1.0
9   cell_10      1.0
10  cell_11      1.0
11  cell_12      1.0
12  cell_13      1.0
13  cell_14      1.0
14  cell_15      1.0
15  cell_16      1.0
16  cell_17      1.0
17  cell_18      1.0
18  cell_19      1.0
19  cell_20      1.0

Any suggestions would be greatly appreciated！

NiklausZZZ commented 1 month ago

By the way, I can not find the file "sc_pretrain_article.pkl".

Patchouli-M commented 1 month ago

I understand. This is because the "./train" path and "./val" path that come with the project only contain a demo composed of 20 cells. They are only meant for testing the code for errors and cannot be used as a normal training set. Therefore, in your first test, the batch_size was set to 128, which exceeded the size of the training and validation sets, resulting in an error. In your second test, the training set was too small, so the pkl file did not have predictive capability.

For prediction, I recommend using the "sc_pretrain_article.pkl" file, which is the result of training with hundreds of thousands of cells and has predictive capabilities. This file is quite large and is not included in the project files. You can find a link to the "sc_pretrain_article.pkl" Google Drive link (about 89 MB) in the "README.md file", from which you can download it and use it for prediction.

The file sample_data/sample_data_matrix.tsv is composed of 10 cancer cells and 10 normal cells, and you can use it for testing.

Patchouli-M / SequencingCancerFinder

Tranning bug "ZeroDivisionError: division by zero" #7