NCI-CGR / IlluminaSequencingAnalysis

All Illumina Sequencing Related project from Xin will be recorded in this repo
0 stars 0 forks source link

VCFClassification: Add a new function to sort and remove duplicate lines in VCF #17

Open lxwgcool opened 3 years ago

lxwgcool commented 3 years ago

Two typical issues

The original VCF has two issues, including:

  1. does not be sorted correctly for some reason
  2. contains duplicate lines

Solution

Add a new function to refine the original VCF

  1. Solve two issues mentioned above
  2. Update the content of original VCF
  3. Change the name of old VCF (contains issues) to the file with suffix (.org) for the purpose of recording.

Related python code

subsetVcfMultiSplit_v2.py