Sentieon / sentieon-scripts

Helper scripts for biological data processing from Sentieon
BSD 2-Clause "Simplified" License
63 stars 21 forks source link

Issue with TNHaplotyper2 VCF #3

Closed sbgLuka closed 3 years ago

sbgLuka commented 3 years ago

Hi, I tried merge MNP script with TNHaplotyper2 VCFs, since it has PID and PGT fields, so I expected to work fine. However in VCFlib script there is an error when parse_field function tries to get type of String, because of an "OBAM" and "OBAMRC" fields so I have added in class VCF(sharder.Shardable):

decoders = {'Integer': int, 'Float': float, 'String': str} encoders = {'Integer': str, 'Float': str, 'String': str}

and everything worked fine. You can test these to check whether it will work with TNHaplotyper2 VCFs or not.

Best regards, Luka.

sbgLuka commented 3 years ago

Just one update, it seems that mentioned fields "OBAM" and "OBAMRC" which introduced strings in INFO fields are from GATK FilterByOrientationBias tool which was later used so, sorry for the confusion. However, I think adding type string to decoder/encoder would create more bulletproof tool in these cases. I managed to make the tool work with this change.

Thanks and sorry for the confusion, Luka.

DonFreed commented 3 years ago

Hi Luka,

Thanks for letting us know about this problem. Which version of the software package are you using? Could you check to see if this error still occurs with the vcflib in the latest version of the Sentieon software package (202010)? Version 201911.01 has some updates to vcflib that should have addressed this issue.

-Don

sbgLuka commented 3 years ago

Hi Don, thanks for answering. I think my Sentieon version used was 201911. Indeed, I tried version 202010 and it worked fine. So I will close this issue now :)

Best, Luka.