andersen-lab / ivar

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.
https://andersen-lab.github.io/ivar/html/
GNU General Public License v3.0
118 stars 40 forks source link

header format of .tsv did not match! #178

Open zdk427 opened 7 months ago

zdk427 commented 7 months ago

Hi I am using version 1.4.2 for SNV analysis. I encounter the issue in Part 4: Filtering the same SNVs from replicates. Which gives me SNV-Singlets but 7.SNV-final only gives empty TSV file. The error snapshot is given below: Ivar error Can you please provide any feedback on it. Thanks

cmaceves commented 7 months ago

Hi! Sorry that you're having issues, thanks for reaching out. Could you be more specific about the origin of "Part 4" and maybe share the .tsv files with me? Based on the given error, I would assume that the SNV variants files being used are not properly formatted but it's hard to tell without sample files!

zdk427 commented 7 months ago

Sure here is the link to .tsv files i got after process 3 in folder 6.Singlet-SNVs Please let me know if you need any further information. https://usaskca1-my.sharepoint.com/:f:/g/personal/zdk427_usask_ca/EolxZ7UcEDpPo6qFVXKB_KoBajXB_8WyJ6k5Kx3QoU3OjA?e=Y1ihRt

Alex-Vasile commented 6 months ago

We also ran into this issue and dug into a bit. It's caused by the extra POS_AA column at the end.

Temporary solution for anyone having this issue

If you don't need the POS_AA data column, pre-process your variant files to remove this column.

In-depth Info

  1. call_variants_from_plup prints out a POS_AA (from a hardcoded set of column headings inside the function). https://github.com/andersen-lab/ivar/blob/08aac334d9499d789803e6b2da2321a32282d255/src/call_variants.cpp#L58-L61

  2. common_variants calls read_variant_file which first checks if the headers are correct: https://github.com/andersen-lab/ivar/blob/08aac334d9499d789803e6b2da2321a32282d255/src/get_common_variants.cpp#L25-L30

However this checks has 2 issues with it:

  1. It uses a parallel, and out of date, set of header names; it's missing POS_AA. This and call_variants_from_plup should be working with a single set of fields so there aren't two parallel structures to update when a change happens.
  2. This code is has an out of bounds error, which is what's happening now. The loop will keep reading heading columns and index into fields even after ctr >= NUM_FIELDS. The loop should terminate if there are more than NUM_FIELDS entries.

https://github.com/andersen-lab/ivar/blob/08aac334d9499d789803e6b2da2321a32282d255/src/get_common_variants.cpp#L3-L8

Also worth considering is changing the error message from common_variants. It currently gives the incorrect impression that the header formats of A_variant and B_variant do not match each other, but what it actually means is that they don't match the expected header. Would be worth changing that message and also printing both the received header and the expected header.