What can we accomplish right now?

scharch commented 4 years ago

We've identified a few candidate datasets (thanks, @matsohlin). At some point, we will use the benchmarking pipeline to start to understand how different tools approach the potential problems we're looking at, but what can we do in the meantime? Is it useful to process these datasets with some specific tool (Immcantation, SONAR, ...) and look at the results?

bcorrie commented 4 years ago

What would these candidate data sets be? If there is a specific candidate data set (from a specific study or perhaps a "simulated" data set) and it is annotated with multiple tools, it is possible for us to put that in an iReceptor repository for people to access and download it. If someone annotates some data, provides some AIRR TSV and AIRR Repertoire metadata, we can easily load it... Not sure how useful that would be, but we can "accomplish this right now" 8-)

I would hesitate a bit to say we would make it available through the iReceptor Gateway for general searching by the research community, as we don't handle a single data set that is annotated with multiple annotation tools too gracefully on the Gateway at the moment. This could be confusing to the user so we would want to manage that. We are working on it...

williamdlees commented 4 years ago

We envisage multiple datasets - some simulated and some real-world. Some of them may exhibit problems - read errors, chimerism, and so on. Not that these problems don't exist in other datasets in the wild, but perhaps we should try not to mix them with other datasets in iReceptor+ to prevent them coming up in searches.

airr-community / gold-standard-datasets

What can we accomplish right now? #8