bigbio / py-pgatk

Python tools for proteogenomics analysis toolkit
Apache License 2.0
11 stars 11 forks source link

vcf-to-proteindb returned incorrect amino acid sequences #74

Open AlirezaShokrollahi opened 1 year ago

AlirezaShokrollahi commented 1 year ago

Hello Dear developers of py-pgatk,

I tried to get proteins' sequences of TCGA MAF file. So, I convert maf file to vcf by maf2vcf.pl. Then, I used vcf-to-proteindb. I also downloads GDC Reference Files: GRCh38.d1.vd1 Reference Sequence and GDC.h38 GENCODE v36 GTF for this. It returned incorrect amino acid sequences for some mutation.

this is the code I wrote: python ../py-pgatk/pypgatk/pypgatk_cli.py vcf-to-proteindb --vcf TCGA-06-A5U1-01A-11D-A33T-08.vcf --input_fasta input_fasta.fa --gene_annotations_gtf gencode.v36.annotation.gtf --annotation_field_name '' --output_proteindb var_peptides.fa

How can I fix this problem?

Thanks.