Closed rderelle closed 1 year ago
This is the designed behaviour for these functions so not a bug as such – the current answer would be to remove the low quality samples.
But adding a filter which also ignores missing sites would be helpful here, so I can add that.
Ok, thanks. Then another filter option would do (it's a difficult decision to remove samples while analysing an outbreak).
nb: on a side note, one could argue that 'constant' does include positions with all identical nucleotides + missing data since there is no observed variation in these positions.
Thanks again!
nb2: but a missing data could be due to an indel. I see your point.
The command 'ska align --filter no-ambig-or-const' seems to output all positions with at least one missing data ('-'), resulting in large alignments of constant positions. nb: same behaviour observed with the command 'ska align --filter no-const'
Using the following command lines, and a dataset of 67 TB samples, I obtained an alignment of 1Mb (100-200 nucleotides expected; mostly because the dataset contains 3 samples with very low coverage and a lot of missing data (see pictures below)):
Thanks Romain