We would like to incorporate simulated datasets, where the ground truth of which alleles were combined is known. While this is not a pabacea, because the real-life recombination mechanism is not completely understood, this should provide some insight into the sensitivity and limitations of annotation tools.
While there is benefit in having simulated datasets that match real ones as closely as possible, there is also benefit in having very simple datasets that can be used to verify correct operation at the most fundamental level of the tool - for example a dataset that includes every allele, has known CDR3s, has non-functional sequences of various kinds. And maybe some more exotic sequences, e.g. heavy chain with no D gene present, duplicated D gene.
We would like to incorporate simulated datasets, where the ground truth of which alleles were combined is known. While this is not a pabacea, because the real-life recombination mechanism is not completely understood, this should provide some insight into the sensitivity and limitations of annotation tools.
While there is benefit in having simulated datasets that match real ones as closely as possible, there is also benefit in having very simple datasets that can be used to verify correct operation at the most fundamental level of the tool - for example a dataset that includes every allele, has known CDR3s, has non-functional sequences of various kinds. And maybe some more exotic sequences, e.g. heavy chain with no D gene present, duplicated D gene.