bioinform / neusomatic

NeuSomatic: Deep convolutional neural networks for accurate somatic mutation detection
Other
168 stars 51 forks source link

preprocess unable to find candidates for training #47

Closed SchKri closed 5 years ago

SchKri commented 5 years ago

I'm trying to train a network using a set of exome data from the Dream Challenge. I create a bed file as suggested in issue 16 and followed your recommendation for distributed data processing on a cluster. Unfortunately there are no candidates found in the preprocessing. Are there certain specifications for the vcf_truth file? I tried adding VarType Information with java -jar SnpSift.jar varType truth_initial.vcf > truth_final.vcf but it didn't help.

I attached the output from the main job and the output of one of the 10 sub-region-jobs.

job.txt

sub-job.txt

Edit: The problem may be corrupted bam files. I'll check this and close the issue if that caused the problem.

msahraeian commented 5 years ago

@SchKri Thanks for your interest in NeuSomatic. From you sub-job.txt it seems that your candidates tsv files are empty. You can verify that by wc -l work/work_*/dataset/work.*/candidates*.tsv. Can you check if the work/work_0/work_tumor/filtered_candidates.tsv is empty? If it was, please share with me one of scan.err files under work/work_0/work_tumor/work.*/scan.err.

SchKri commented 5 years ago

Thanks for your reply. The candidates*.tsv files were empty. The problem was a corrupted (empty) bam file. I fixed that and now it works.