BioJulia / Bio.jl

[DEPRECATED] Bioinformatics and Computational Biology Infrastructure for Julia
http://biojulia.dev
MIT License
261 stars 65 forks source link

Indexed versions of {omega,phi,psi}angle #484

Closed h3nnn4n closed 7 years ago

h3nnn4n commented 7 years ago

While using Bio.Structure for a project I found a use case for an indexed version for the {omega,phi,psi}angle functions.

h3nnn4n commented 7 years ago

Since the original functions accept the residues in either order I was thinking about adding this possibility for the indexed version. To my eyes there are 3 possibilities:

Comments are much appreciated.

codecov-io commented 7 years ago

Codecov Report

Merging #484 into master will increase coverage by 0.83%. The diff coverage is 86.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #484      +/-   ##
==========================================
+ Coverage   70.34%   71.17%   +0.83%     
==========================================
  Files          34       34              
  Lines        2421     2578     +157     
==========================================
+ Hits         1703     1835     +132     
- Misses        718      743      +25
Impacted Files Coverage Δ
src/structure/spatial.jl 90.54% <86.95%> (-1.99%) :arrow_down:
src/structure/pdb.jl 89.22% <0%> (-5.61%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e08f2a3...83c89ae. Read the comment docs.

jgreener64 commented 7 years ago

Thanks for making this PR @h3nnn4n, this is a really nice addition! I'll add some line comments to the code.

Also would be good to add something to the docs at https://github.com/BioJulia/Bio.jl/blob/master/docs/src/man/structure.md, e.g. a line after the spatial table saying something like "omegaangle, phiangle and psiangle can take as input either two residues or a chain and a position as an integer".

jgreener64 commented 7 years ago

I don't quite understand what you mean by your second comment. The original functions accept two AbstractResidues but they have to be in the correct order - if they are not it will return a value but that won't correspond to a meaningful dihedral.

jgreener64 commented 7 years ago

There is one more complexity which is that integers are not enough to define the order of residues in a chain. This is due to insertion codes and heteroatoms.

h3nnn4n commented 7 years ago

Makes sense. It is not as easy as I thought, hehe. I will see if I can find a simple solution.

jgreener64 commented 7 years ago

I will have a further look and try and make suggestions over the weekend. The changes you just pushed look good.

h3nnn4n commented 7 years ago

I was thinking about using residue_extractor and sequentialresidues to extract only the aminoacids. The downside is that an array with all the residues would be allocated and thrown away at each call.

Another thing to consider is that it is possible to pass a chain that is not a protein. My implementation is accepting RNA and DNA sequences as well.

jgreener64 commented 7 years ago

I think you could do something like this:

function psiangle(chain::Chain, res_id::Union{Integer, String})
    inds = find(r -> resid(r)==string(res_id), chain)
    if length(inds) != 1
        throw(ArgumentError("$res_id is an invalid residue ID"))
    end
    res_list = collect(chain)
    i = inds[1]
    if i == length(res_list) || !sequentialresidues(res_list[i], res_list[i+1])
        throw(ArgumentError("$res_id is an invalid residue ID"))
    end
    return psiangle(res_list[i], res_list[i+1])
end

So you find the list index for the residue index given (e.g. 10, 20A or H_30), check one is returned, check the next item is a sequential residue (it may not necessarily be) and if so make the psiangle call.

You are right, this is a lot of overhead, but we can see how fast it goes.

Note for omega and phi the second condition changes to if i == 1 || !sequentialresidues(res_list[i-1], res_list[i]).

jgreener64 commented 7 years ago

Yes, Bio.Strucure seems to work okay with DNA/RNA though these functions refer to the protein Ramachandran angles.

h3nnn4n commented 7 years ago

In my head makes sense that passing 1 to psiangle should return the first AA.

Is this behavior correct?

Take 1ZDD for example, first residue is indexed at 6, so if I want to extract all the angles then I need to start at six and go to 6 + number_of_AA instead of 1 to number_of_AA.

h3nnn4n commented 7 years ago

What I have in mind is more or less this:

function psiangle(chain::Chain, pos::Integer)
    res_list = collectresidues(chain, standardselector)
    if pos == length(res_list) || pos < 1 || !sequentialresidues(res_list[pos], res_list[pos+1])
        throw(ArgumentError("$res_id is an invalid residue ID"))
    end
    return psiangle(res_list[pos], res_list[pos+1])
end
jgreener64 commented 7 years ago

Ah interesting, yes I see. I think this would make an inconsistent API though because chain[6] would return the residue with resid 6 but psiangle(chain, 6) would get the psi angle for the 6th residue in the sequence (resid 11 in your example).

To get the behaviour you want you could use ramachandranangles:

phi, psi = ramachandranangles(chain, standardselector)
psi[1]

We could think about a psiangles function to just get the psi angles from this.

h3nnn4n commented 7 years ago

I implemented your suggestions to the original functions that I submitted.

I also implement the functions to return the array of angles for omega, psi and phi. Those functions are pretty much the same as ramachandranangles. They are exported as {phi,psi,omega}angles.

jgreener64 commented 7 years ago

I'll give a day or so more for comments and then merge this.

jgreener64 commented 7 years ago

Thanks for all the work @h3nnn4n .