charite / jannovar

Annotation of VCF variants with functional impact and from databases (executable+library)
http://jannovar.readthedocs.io/en/master/
Other
58 stars 35 forks source link

Logic of getFivePrimeUTRInterval #420

Closed pnrobinson closed 5 years ago

pnrobinson commented 5 years ago

There are many cases where the 5'UTR also includes an intron. Therefore, either we should return a List or we should mention in the documentation that the function returns an interval that potentially includes an intron.

/**
     * Returns the <b>genomic</b> 5' UTR interval.
     *
     * @return the {@link GenomeInterval} with the 5' UTR
     */
    public GenomeInterval getFivePrimeUTRInterval() {
        GenomePosition fivePrimeUTRBeginPos = transcript.getTXRegion().getGenomeBeginPos();
        int fivePrimeUTRLen = transcript.getCDSRegion().getGenomeBeginPos().differenceTo(fivePrimeUTRBeginPos);
        return new GenomeInterval(fivePrimeUTRBeginPos, fivePrimeUTRLen);
}
holtgrewe commented 5 years ago

I agree that the API is a bit sparse here but the 5' UTR function is consistent with getting the transcript or CDS genomic regions. You would then have to check for overlaps with intronic and exonic regions.

Some GenomeInterval arithmetics might come in handy here, I agree.

iimpulse commented 5 years ago

I am writing a library that leverages jannovar to do this functionality. Once complete I can open a PR.

visze commented 5 years ago

@iimpulse Just a warning about the code style that will be changed: See #424 How far you are with your work? When do you plan the PR?

holtgrewe commented 5 years ago

@visze I think @iimpulse wants to write a library using Jannovar?

iimpulse commented 5 years ago

@visze @holtgrewe I am writing a library using jannovar, but it would be nice to have some of this functionality rolled up into jannovar. I think this mostly depends on the long-term outlook of jannovar and whether or not this library has an intended use of being an api/library or a cli packaged application.

None the less.. I will start enforcing google coding style standards.

Functionality has been completed. If you feel this functionality is necessary I can open it in the next two days.

holtgrewe commented 5 years ago

@iimpulse Please note that we actually decided following IntelliJ Java standard wrapped comments for Jannovar code.

For me, the main focus is features/usability as a library and exposing most of this functionally through CLI.

holtgrewe commented 5 years ago

I needed this for SV annotation, 512476a46a0260101396909942ea77dda33d75f4 has a function and tests.

iimpulse commented 5 years ago

https://github.com/charite/jannovar/blob/512476a46a0260101396909942ea77dda33d75f4/jannovar-core/src/main/java/de/charite/compbio/jannovar/reference/TranscriptSequenceOntologyDecorator.java#L57

Might be confusing to future contributor, this is the utrLength including introns.. consider adding a comment mentioning something to that effect. The true length of the utr would be the sum of the intersection of exon intervals.

Nice job on this. My implementation was over engineered didn't notice the intersection method. Will upgrade and leverage this in the future.

@holtgrewe