Closed mingyi-wang closed 7 years ago
Yes, all the input BAM and VCF files are assumed to be ordered in the same order as the input reference file. So if some files are ordered 1, 10, 11, ...., and some other files are ordered 1, 2, 3, ...., they will not be annotated correctly. I plan to introduce a ordering check in the future and produce a warning if they aren't ordered correctly.
I tried to run somaticseq to train the model based on truth VCF files. However, after running SSeq_merged.vcf2tsv.py step, I found the last column (TrueVariant_or_False) of the Ensemble.sSNV.tsv are incorrect. Some true variants are tagged as "0". I traced back and found the truth VCF file I used ordered chromosome in the the default ordering (1,10, 11, ..., 2, MT,X) rather than natural ordering (1,2, ..., 10, 11, ..., MT,X). Is this the reason that produced last column in error? If yes, what's the ordering requirement for a truth VCF? Thanks,