airr-community / airr-formats

PLEASE SEE airr-standards FOR FURTHER DEVELOPMENT: https://github.com/airr-community/airr-standards
MIT License
1 stars 2 forks source link

Finalize field names #9

Closed laserson closed 7 years ago

scharch commented 7 years ago

Apropos of #10, I don't see anything like v_start etc in the current spec, if I am looking in the right place...

schristley commented 7 years ago

The V is separated into its framework and CDR regions, so fwr1_start is the v_start.

scharch commented 7 years ago

What about v_end, d_start, d_end, and j_start, as those don't correspond to FWR/CDR boundaries?

schristley commented 7 years ago

The challenge with those is defining them accurately. Hmm, yeah, so what is v_end? It's not cdr3_start because some of the cdr3 can come from the V gene. Is it somewhere inside the cdr3? But if nucleotides are "chewed off" from the end of the V gene as part of recombination, what does v_end define? j_start has a similar issue. It's not cdr3_end because some of the J gene can lie in the cdr3. d_start and d_end I'm not sure about.

scharch commented 7 years ago

I assume it would be for the tool to define, based on whatever alignment procedure it's using.

schristley commented 7 years ago

That's reasonable. Currently those positional fields are optional so tools aren't required to define them.

laserson commented 7 years ago

@scharch, in my personal experience, v_end, d_start, d_end and j_start never seemed to be as important, and they are definitely harder to annotate. We could add them to the spec but not make them mandatory.

scharch commented 7 years ago

@laserson yes, exactly

javh commented 7 years ago

I think v_end, d_start, d_end and j_start should be required. You can't piece together a complete germline sequence without them.

laserson commented 7 years ago

sgtm