airr-community / gold-standard-datasets

Reference AIRR-Seq datasets for benchmarking tools
0 stars 0 forks source link

Ground Truth #6

Open williamdlees opened 3 years ago

williamdlees commented 3 years ago

We would like to incorporate simulated datasets, where the ground truth of which alleles were combined is known. While this is not a pabacea, because the real-life recombination mechanism is not completely understood, this should provide some insight into the sensitivity and limitations of annotation tools.

While there is benefit in having simulated datasets that match real ones as closely as possible, there is also benefit in having very simple datasets that can be used to verify correct operation at the most fundamental level of the tool - for example a dataset that includes every allele, has known CDR3s, has non-functional sequences of various kinds. And maybe some more exotic sequences, e.g. heavy chain with no D gene present, duplicated D gene.