Closed laserson closed 7 years ago
The V is separated into its framework and CDR regions, so fwr1_start is the v_start.
What about v_end, d_start, d_end, and j_start, as those don't correspond to FWR/CDR boundaries?
The challenge with those is defining them accurately. Hmm, yeah, so what is v_end? It's not cdr3_start because some of the cdr3 can come from the V gene. Is it somewhere inside the cdr3? But if nucleotides are "chewed off" from the end of the V gene as part of recombination, what does v_end define? j_start has a similar issue. It's not cdr3_end because some of the J gene can lie in the cdr3. d_start and d_end I'm not sure about.
I assume it would be for the tool to define, based on whatever alignment procedure it's using.
That's reasonable. Currently those positional fields are optional so tools aren't required to define them.
@scharch, in my personal experience, v_end
, d_start
, d_end
and j_start
never seemed to be as important, and they are definitely harder to annotate. We could add them to the spec but not make them mandatory.
@laserson yes, exactly
I think v_end
, d_start
, d_end
and j_start
should be required. You can't piece together a complete germline sequence without them.
sgtm
Apropos of #10, I don't see anything like v_start etc in the current spec, if I am looking in the right place...