bigdatagenomics / adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Apache License 2.0
998 stars 309 forks source link

Immune sequencing analysis #145

Closed laserson closed 10 years ago

laserson commented 10 years ago

My PhD primarily focused on analyzing antibody sequencing. There was (is) a dearth of well-developed tools, and the tool that's considered the gold standard (IMGT/V-QUEST) is only available through a webserver, even for NGS data sets. This has led to a fragmentation of tools with no leading open-source standard. Some examples are Bruno Gaeta's iHMMune-align, my vdj repo, Jake Glanville's vdjfasta, Ramy Arnaout's TopCoder winner, joinsolver, and IgBlast, among others. It would be great to add some data types and analyses specifically geared towards this type of data. I am happy to contribute the schemas that I used in grad school (I worked with JSON objects). Is it out of scope to put it in ADAM?

hammer commented 10 years ago

Seems reasonable to make them build upon the existing formats and put them in bdg-formats

laserson commented 10 years ago

Oh perfect, just saw #135.

tdanford commented 10 years ago

I just filed a pull request over on the bdg-formats repo, https://github.com/bigdatagenomics/bdg-formats/pull/1

Hopefully this gets us moving a little bit on this.

tdanford commented 10 years ago

As with #135, since bdg-formats is now up-and-running and we're successfully using it in at least the adam-core module, I'm going to mark this as closed -- any objections, @laserson?

laserson commented 10 years ago

Go ahead.

tdanford commented 10 years ago

Thanks!