clingen-data-model / allele

Documentation for data model of ClinGen
10 stars 2 forks source link

Clean up definitions (resource and conceptual) for Gene #80

Closed cbizon closed 9 years ago

cbizon commented 9 years ago

Also check the requirements for the required fields etc....

cbizon commented 9 years ago

If you find changes to the xsd, submit them directly to Larry.

cbizon commented 9 years ago

I'm unsure how much to modify Resource:Gene definition. In the resource model, our definition for gene is the SO definition:

A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions.

While it's good to have external terms, I'm not sure that this is an appropriate definition for the gene in our resource model: We don't include regulatory regions in our data, and we include e.g. pseudogenes that never make a functional transcript. Modify to be closer to the conceptual definition, or keep to maintain ties to external vocabularies?

Opinions?

srynobio commented 9 years ago

If you look at the relationship pseudogene has to gene, it offers improved differentiation which allow us to use both definitions without violating either, i.e. when speaking of pseudogenes we get the benifit of dragging the relationship definition with us.

As far as documentation goes, you would use the same approach to clarify that was used for SimpleAllele:

_SimpleAllele as here defined is similar to the SO term sequencevariant, but where that definition describes a difference with respect to a sequence, SimpleAllele explicitly allows the reference allele, so that there would not be a difference with respect to the reference sequence; note that the reference allele is not guaranteed to be the minor allele in any population.

cbizon commented 9 years ago

This is maybe partly a concern about the way that the SO defines Gene. It seems to me that if gene is defined as "sequence elements necessary to encode a functional transcript" then it doesn't include psuedogenes. But as you point out, pseudgene is a gene in the SO. I think you are right about the way to proceed, though.

cbizon commented 9 years ago

Does anybody know if LRG contain regulatory elements etc, or is it just transcripts?