Closed pnrobinson closed 5 years ago
I agree that the API is a bit sparse here but the 5' UTR function is consistent with getting the transcript or CDS genomic regions. You would then have to check for overlaps with intronic and exonic regions.
Some GenomeInterval
arithmetics might come in handy here, I agree.
I am writing a library that leverages jannovar to do this functionality. Once complete I can open a PR.
@iimpulse Just a warning about the code style that will be changed: See #424 How far you are with your work? When do you plan the PR?
@visze I think @iimpulse wants to write a library using Jannovar?
@visze @holtgrewe I am writing a library using jannovar, but it would be nice to have some of this functionality rolled up into jannovar. I think this mostly depends on the long-term outlook of jannovar and whether or not this library has an intended use of being an api/library or a cli packaged application.
None the less.. I will start enforcing google coding style standards.
Functionality has been completed. If you feel this functionality is necessary I can open it in the next two days.
@iimpulse Please note that we actually decided following IntelliJ Java standard wrapped comments for Jannovar code.
For me, the main focus is features/usability as a library and exposing most of this functionally through CLI.
I needed this for SV annotation, 512476a46a0260101396909942ea77dda33d75f4 has a function and tests.
Might be confusing to future contributor, this is the utrLength including introns.. consider adding a comment mentioning something to that effect. The true length of the utr would be the sum of the intersection of exon intervals.
Nice job on this. My implementation was over engineered didn't notice the intersection method. Will upgrade and leverage this in the future.
@holtgrewe
There are many cases where the 5'UTR also includes an intron. Therefore, either we should return a List or we should mention in the documentation that the function returns an interval that potentially includes an intron.