MathOnco / NeoPredPipe

Neoantigens prediction pipeline for multi- or single-region vcf files using ANNOVAR and netMHCpan.
GNU Lesser General Public License v3.0
100 stars 28 forks source link

What is parameters c(colregions) means? #11

Closed Tiredbird closed 5 years ago

Tiredbird commented 5 years ago

Hi I read the program parameters, but I could not understand parameters c(colregions) . In the example,python NeoPredPipe.py -I ./Example/input_vcfs -H ./Example/HLAtypes/hlatypes.txt -o ./ -n TestRun -c 1 2 -E 8 9 10 Does “-c 1 2“ mean the vcf contains 2 tumor columns?

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT N T1 T2

if my vcf is as following,

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR

I should choose "-c 1". is that right?

Thanks

elakatos commented 5 years ago

Hi, Yes, after -c you specify the index of tumour columns (or other columns of interest), with a zero-based indexing , so -c 1 2 tells the program to ignore the first sample info in the vcf file (belonging to 'N', normal), and output information based on the second and third, T1 and T2.

Your example would be -c 1, yes - however, if you only have 1 tumour sample, you should leave out the -c argument as your vcf is not multi-region. For multi-region files, the -c flag and corresponding postprocessing is needed, so that we can report which neoantigen is present in a particular region of the sample. However, in single region files, like yours, a variant is only present in the file if it is in the tumour sample, so no further information has to be processed.

With that being said, if you have some other files that for example have 'NORMAL TUMOUR1 TUMOUR2 TUMOUR3', you can run the whole pipeline on all samples, including the single region ones, using the flag -c 1 2 3.