charnley / rmsd

Calculate Root-mean-square deviation (RMSD) of two molecules, using rotation, in xyz or pdb format
BSD 2-Clause "Simplified" License
495 stars 114 forks source link

Enhancement of Kabsch-based routines to work with weighted data #53

Closed CharlieLaughton closed 4 years ago

CharlieLaughton commented 5 years ago

Hi,

Thanks for a useful little package. If you are interested, I have tweaked the Kabsch-related least-squares fitting routines to accept an optional weights argument, so that, for example, the routines can be used for mass-weighted fitting. I've also added a utility function "Kabsch_fit()", so:

nP = kabsch_fit(P, Q. W)

results in nP being a copy of P least-squares fitted to Q, weighted according to W.

I added a couple of relevant tests as well.

nbehrnd commented 5 years ago

An interesting thought indeed; crystallographically speaking it may be used to emphasize more on heavier atoms easier to localize in diffraction experiments than lighter ones. Such is already implemented in aRMSD (forked here) with optional choice between mass, proton count, scatter factor and it will be interesting to compare with your addition. (Contrasting to calculate_rmsd.py, aRMSD offers only a pairwise comparison, though.)

A question / feature suggest: since your feature already allows one form of gradual attenuation of contribution, what about a function scaling the contribution by distance, by power-of-two, or even by power-of-three such that differences spot farther away do not account as much that those in proximity to the centre of the centroid? Perhaps acting independently of the option "scaling by mass", but why not scale by mass .AND. distance? Just as a thought.

CharlieLaughton commented 5 years ago

Ah - perhaps I should clarify - although I suggested that the weights argument W might be a vector of atomic masses, to do mass-weighted fitting, there is no requirement that it is. A user can set W to whatever they like - e.g. to implement your suggestions.

charnley commented 5 years ago

Hi @CharlieLaughton. Sorry for the long delay of attention on this pull request! It looks great, I only have two points.

If you agree and correct those small changes I will merge your pull request as soon as possible

best regards Jimmy

CharlieLaughton commented 4 years ago

Hi Jimmy,

My turn to apologies for the delay – will do!

Best wishes,

Charlie

From: Jimmy Charnley Kromann notifications@github.com Reply to: charnley/rmsd reply@reply.github.com Date: Wednesday, 18 September 2019 at 16:56 To: charnley/rmsd rmsd@noreply.github.com Cc: Charles Laughton Pazcal@exmail.nottingham.ac.uk, Mention mention@noreply.github.com Subject: Re: [charnley/rmsd] Enhancement of Kabsch-based routines to work with weighted data (#53)

Hi @CharlieLaughtonhttps://github.com/CharlieLaughton. Sorry for the long delay of attention on this pull request! It looks great, I only have two points.

U = kabsch(P, Q, W) # not keyword U = kabsch(P, Q, W=W) # keyword

if W is None: U = kabsch(P, Q) else: U = kabsch_weighted(P, Q, W)

If you agree and correct those small changes I will merge your pull request as soon as possible

best regards Jimmy

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/charnley/rmsd/pull/53?email_source=notifications&email_token=ACBJSX4XSYK5UWFYWRLBVD3QKJFQ3A5CNFSM4HBRPES2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7AR3OA#issuecomment-532749752, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACBJSX6ORB22724XYSUYEYDQKJFQ3ANCNFSM4HBRPESQ.

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please contact the sender and delete the email and attachment.

Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. Email communications with the University of Nottingham may be monitored where permitted by law.

CharlieLaughton commented 4 years ago

Hi Jimmy, OK, this should be ready to go. Looking again at my code I discovered a bug, and to fix this and address your points I've ended up re-writing quite a bit.

So that the weighted fitting code treads more lightly on your existing code, I have created three weighting-specific functions: kabsch_weighted(), kabsch_weighted_fit(), and kabsch_weighted_rmsd(). The only entry point from your code into these new functions is in kabsch_rmsd(), the other is via my new kabsch_fit()