bioinform / neusomatic

NeuSomatic: Deep convolutional neural networks for accurate somatic mutation detection
Other
168 stars 51 forks source link

How could I use different NT samples to train some times,and then call variants #45

Closed ggoodstudydaydayup closed 5 years ago

ggoodstudydaydayup commented 5 years ago

Hi, I want to use the different NT samples to trains some times. I'm not sure could the train_work can be saved in one file and then used to call variants. Thanks in advance. Best wishes.

msahraeian commented 5 years ago

@ggoodstudydaydayup Happy to see your interest in NeuSomatic. Yes, NeuSomatic can handle multiple sample training. What you need to do is to run preprocessing step for each sample separately. Then in the training process where you should provide --candidates_tsv you can include all the candidate files from all samples. Here is the detail:

1-Preprocess for each sample. For example for sample 1:

python preprocess.py \
    --mode train \
    --reference GRCh38.fa \
    --region_bed region.bed \
    --tumor_bam tumor_1.bam \
    --normal_bam normal_1.bam \
    --work work_train_1 \
    --truth_vcf truth_1.vcf \
    --min_mapq 10 \
    --number_threads 10 \
    --scan_alignments_binary ../bin/scan_alignments

2-Train on sample_1 to sample_n:

python train.py \
    --candidates_tsv work_train_1/dataset/*/candidates*.tsv work_train_2/dataset/*/candidates*.tsv 
 ... work_train_n/dataset/*/candidates*.tsv \
    --out work \
    --num_threads 10 \
    --batch_size 100 

3-Use the checkpoint from this multi-sample training to call variants on other samples.

ggoodstudydaydayup commented 5 years ago

Thanks