Closed bcorrie closed 6 years ago
They are the same, the names are just out of sync. Formats used [vdjc]_call as that was the field names used by Change-O so it became the initial design. If we change the field names in the formats spec, we need to file an issue with Change-O to update its field mapping. @javh
I don't think these are necessarily the same. For the GenBank submission standards, if that's what we are talking about, it would look something like this:
V_segment 93..388
/gene="IGHV4-39"
/allele="01"
/db_xref="IMGT/GENE-DB:IGHV4-39"
/inference="similar to DNA sequence:IMGT/HighV-QUEST:1.5.5"
So the V gene and V allele are sub-components of the V inference call. Ie, v_call
maps to both v_gene
and v_allele
.
Okay, so if GenBank needs them separate then we likely also need to separate these as two different fields. Relying upon some parsing rules to extract the gene and allele will bite us down the road IMO.
My preference could be to leave the inference as a single [vdjc]_call
field.
Not every aligner makes allele level calls. Plus, there's already a fair amount of parsing that needs to happen to go from the alignment data to the GenBank submission. Eg, you might start with something like Homsap IGHV6-1*01 F,Homsap IGHV6-1*02 F
as your v_call
, but then need to extract the genes/alleles from that and resolve any ambiguity.
If you want to go with a single field (along the lines of MiAIRR), *_call
is good as it is generic (in contrast to *_allele
, *_gene
, etc.), thus I would be in favor of using it instead of the other alternatives.
The issue is - as already noted - if some downstream process requires the individual components (locus, type, family, number, alllele), as it tends to be difficult to parse the information from the string. Also note that the typical IMGT format only applies to humans and is at variance with standard mouse nomenclature.
We should ensure that it is very clear which fields in the Formats spec are identical to which fields in the MiAIRR spec. The obvious way to do this is ensure the field names are the same, but I think we want to make sure that this is explicitly pointed out in the Formats documentation. That is, we should be clear to the outside world that there is (at least I think there is 8-) a direct link between the Formats field X (e.g. v_call) and the MiAIRR field Y (e.g. vgene_allele), assuming that happens to be the mapping. Obviously, this link is more clear if the field names are the same once we have an agreed on definition.
In fact, it probably makes sense for the Formats YAML definition include the "6 / data (proc. seq." field definitions from the MiAIRR YAML file to ensure that they are indeed the same. This is what we are doing with the iReceptor API YAML/Swagger definition.
When airr-formats is merged into airr-standards then they will be the exact same fields, so don't confuse the temporary situation of there being two specs. There is only one.
Seems like consensus is for _call
. I'm closing this with airr-community/airr-standards#33.
Reopening just so we remember to change this:
c_call
and constant
are redundant.
constant
was removed.
Hello All,
Any reason why the format group has a different name for the VDJC information than the Minimum Standards group? Correct me if I am wrong, but these are reporting the same thing, no? The mapping I have is:
Everything is there, but the names are different. Does that imply they are different things or is that an oversight?