training - input, target not matching

vendig commented 6 years ago

An error: RuntimeError: input and target shapes do not match: input [24 x 1], target [24] at /pytorch/aten/src/THCUNN/generic/SmoothL1Criterion.cu:12 was already reported by other people. For example: https://discuss.pytorch.org/t/runtimeerror-multi-target-not-supported-newbie/10216/3

I am using pytorch version 0.4.1.

Input: %sh export PYTHONPATH="/databricks/python/local/lib/python2.7/site-packages:$PYTHONPATH" # this is just a hack, don't use this code. cd /local_disk0/neu2/neusomatic-master/test/example

/usr/bin/python ../../neusomatic/python/preprocess.py \ --mode train \ --reference Homo_sapiens.GRCh37.75.dna.chromosome.22.fa \ --region_bed ../region.bed \ --tumor_bam ../tumor.bam \ --normal_bam ../normal.bam \ --work work_train \ --truth_vcf ../NeuSomatic_ensemble.vcf \ --min_mapq 10 \ --num_threads 1 \ --scan_alignments_binary ../../neusomatic/bin/scan_alignments

%sh export PYTHONPATH="/databricks/python/local/lib/python2.7/site-packages:$PYTHONPATH" cd /local_disk0/neu2/neusomatic-master/test/example

/usr/bin/python ../../neusomatic/python/train.py \ --candidates_tsv work_train/dataset//candidates.tsv \ --out work_train \ --num_threads 10 \ --batch_size 100

Output: INFO 2018-09-15 16:43:43,998 main Namespace(batch_size=100, boost_none=10, candidates_tsv=['work_train/dataset/work.0/candidates_0.tsv'], checkpoint=None, coverage_thr=100, ensemble=False, lr=0.01, lr_drop_epochs=400, lr_drop_ratio=0.1, max_epochs=1000, max_load_candidates=1000000, momentum=0.9, none_count_scale=2, num_threads=10, out='work_train', validation_candidates_tsv=[]) INFO 2018-09-15 16:43:44,018 main use_cuda: True INFO 2018-09-15 16:43:44,018 main ----------------------------------------------------------- INFO 2018-09-15 16:43:44,018 main Train NeuSomatic Network INFO 2018-09-15 16:43:44,018 main ----------------------------------------------------------- INFO 2018-09-15 16:43:46,327 main tag: neusomatic_18-09-15-16-43-46 INFO 2018-09-15 16:43:46,328 dataloader [211] INFO 2018-09-15 16:43:46,478 dataloader Loaded 211 candidates for work_train/dataset/work.0/candidates_0.tsv INFO 2018-09-15 16:43:46,484 main Non-somatic candidates: 203 INFO 2018-09-15 16:43:46,484 main Somatic candidates: 8 INFO 2018-09-15 16:43:46,484 main Non-somatic considered in each epoch: 16 INFO 2018-09-15 16:43:46,484 main #Train cadidates: 211 INFO 2018-09-15 16:43:46,484 main count type classes: [('DEL', 2), ('INS', 1), ('NONE', 16), ('SNP', 8)] INFO 2018-09-15 16:43:46,485 main weight type classes: [('DEL', 0.23148148148148148), ('INS', 0.24074074074074076), ('NONE', 0.10185185185185186), ('SNP', 0.17592592592592593)] INFO 2018-09-15 16:43:46,485 main weight length classes: [(0, 16), (1, 8), (2, 2), (3, 1)] INFO 2018-09-15 16:43:46,485 main weight length classes: [(0, 0.10185185185185186), (1, 0.17592592592592593), (2, 0.23148148148148148), (3, 0.24074074074074076)] INFO 2018-09-15 16:43:46,485 main weights_type:[0.23148148 0.24074074 1.01851852 0.17592593], weights_length:[1.01851852 0.17592593 0.23148148 0.24074074] INFO 2018-09-15 16:43:46,487 main Number of candidater per epoch: 24 Traceback (most recent call last): File "../../neusomatic/python/train.py", line 420, in args.max_load_candidates, args.coverage_thr, use_cuda) File "../../neusomatic/python/train.py", line 326, in train_neusomatic ) + 1 criterion_crossentropy2(outputs_len, var_len_labels) File "/databricks/python/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, **kwargs) File "/databricks/python/local/lib/python2.7/site-packages/torch/nn/modules/loss.py", line 735, in forward return F.smooth_l1_loss(input, target, reduction=self.reduction) File "/databricks/python/local/lib/python2.7/site-packages/torch/nn/functional.py", line 1687, in smooth_l1_loss return torch._C._nn.smooth_l1_loss(input, target, reduction) RuntimeError: input and target shapes do not match: input [24 x 1], target [24] at /pytorch/aten/src/THCUNN/generic/SmoothL1Criterion.cu:12

Thank you. Best, vendi

vendig commented 6 years ago

Found out that this part in a loss function makes problems:

criterion_smoothl1(outputs_pos, var_pos_s[:, 1]) + 1

Maybe var_pos_s[:, 1]?

msahraeian commented 6 years ago

@vendig please try to use Pytorch v0.3.1 as listed in the requirements. Currently the code is designed based on Pytorch v0.3.1 and only works with that version. I think there are several other compatibility issues as well if you use v0.4.1. In future, we try to extend the code to make it work with later Pyorch versions.

I would suggest to do a fresh miniconda installation; and then, try to follow the instructions in README to install python packages with right versions. You should be able to get all python packages through conda.

BTW, please pull the latest commit on master as we have done some bug fixings.

vendig commented 6 years ago

I installed Pytorch v0.3.1 and training works now. Thank you.

bioinform / neusomatic

training - input, target not matching #8