airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Field consistency check for RearrangedSequence #590

Closed javh closed 1 year ago

javh commented 2 years ago

The RearrangedSequence Germline schema includes some fields with very similar, or identical, meaning to fields in Rearrangement and/or Alignment. particularly the following:

Can we clarify the intended meaning and rename accordingly?

williamdlees commented 2 years ago

RearrangedSequence is a record which points to the existence of evidence in a repository. For example this could be PCR amplified cDNA in Genbank supporting the existence of a D allele, in which case seq_start and seq_end would delimit the sequence of the D allele utself, within the seuqnece depositied at Genbank. Although there is similarity, I don't think it is such that we should seek to use the same name in both cases.

Notes is a free-text field in which curators can add notes. In OGRDB these sometimes run to a page or two.

javh commented 2 years ago

That sounds identical to Alignment:sequence_start and Alignment:sequence_end to me, or at the very least so similar that seq_ and sequence_ are unclear distinctions. I suppose it's similar to germline_start / germline_end as well.

Regarding notes, is this similar enough to Study:study_description and Repertoire:repertoire_description that it should be named _description?

williamdlees commented 2 years ago

sorry, I was put off by the _v.

I will change notes to _description, with a description indicating that these are for curational notes.

javh commented 2 years ago

I will change notes to _description, with a description indicating that these are for curational notes.

Okay. Though, if you think curation notes are different from a description, then we should use a different name. Just trying to understand what is and is not equivalent and harmonize wherever appropriate.

williamdlees commented 2 years ago

I don't think there's a sufficiently strong distinction to merit a different name.