Closed ggoodstudydaydayup closed 5 years ago
@ggoodstudydaydayup Happy to see your interest in NeuSomatic.
Yes, NeuSomatic can handle multiple sample training. What you need to do is to run preprocessing step for each sample separately. Then in the training process where you should provide --candidates_tsv
you can include all the candidate files from all samples. Here is the detail:
1-Preprocess for each sample. For example for sample 1:
python preprocess.py \
--mode train \
--reference GRCh38.fa \
--region_bed region.bed \
--tumor_bam tumor_1.bam \
--normal_bam normal_1.bam \
--work work_train_1 \
--truth_vcf truth_1.vcf \
--min_mapq 10 \
--number_threads 10 \
--scan_alignments_binary ../bin/scan_alignments
2-Train on sample_1 to sample_n:
python train.py \
--candidates_tsv work_train_1/dataset/*/candidates*.tsv work_train_2/dataset/*/candidates*.tsv
... work_train_n/dataset/*/candidates*.tsv \
--out work \
--num_threads 10 \
--batch_size 100
3-Use the checkpoint from this multi-sample training to call variants on other samples.
Thanks
Hi, I want to use the different NT samples to trains some times. I'm not sure could the train_work can be saved in one file and then used to call variants. Thanks in advance. Best wishes.