fix bug with NA values in ska.distances.tsv and add known user error for divergent samples

There are cases where NA values will appear in the ska.distances.tsv file for samples with no overlapping kmers. We were filling na values with zeroes at a later step in the ComputeClusters script, but this gives the false appearance that these samples are extremely similar. This update applies fillna = 1 earlier in the parsing of the ska.distances.tsv file so that the samples with NA values are maximally divergent in the resulting heatmap.

Additionally, cases where NA values appear in the ska.distances.tsv file may result in the case where there are no variants from which to construct the tree. This causes an error due to invalid inputs to iqtree (i.e. a malformed .fasta file). For example:

>seq1

>seq2

>seq3

This PR also adds functionality to catch such errors under the class of TooDivergentError.

This was tested on a sample set that generated NA values in the ska.distances.tsv file.

chanzuckerberg / idseq-workflows

fix bug with NA values in ska.distances.tsv and add known user error for divergent samples #131