Closed brookehus closed 5 years ago
Sounds good. Reviewing now.
I think I found some strange behavior: for a 5 bead test molecule with full backbone stats: If I allow features
to be a kwarg in get_prior_statistics()
(see comment above):
prior_features = stats.get_prior_statistics(features=[(0,1),(3,4)])
for k,v in prior_features.items():
print(k,v)
0 {'mean': tensor(1.9042), 'std': tensor(0.6031)}
3 {'mean': tensor(2.5001), 'std': tensor(0.6525)}
looks good. Now changing the first tuple somehow changes the index of the second:
prior_features = stats.get_prior_statistics(features=[(0,2),(3,4)])
for k,v in prior_features.items():
print(k,v)
3 {'mean': tensor(1.9042), 'std': tensor(0.6031)}
4 {'mean': tensor(2.5001), 'std': tensor(0.6525)}
With respect to the master list attribute, the sub-sequence [(0,2),(3,4)]
is out of order. Should we guard against/correct this behavior if the user enters in an unordered list of feature tuples?
Edit: seems that there is something funky in return_indices
. In this formulation, the method assumes the input features will be ordered in the same sense as the master list attribute.
print(stats.return_indices([(0,1),(3,4)]))
[0, 3]
print(stats.return_indices([(0,2),(3,4)]))
[3, 4]
For a general user this may be unsafe unless we state somewhere in the docs or correct for it. Will take a closer look. What do you think?
I will look at this funky behavior too.
Although note that the keys being integers was a bug - they're back to the tuples now. But we could change to integers? Seems more confusing for the reason you described above
Summarizing some of what I said on slack:
get_prior_statistics
had a ton of bugs (thanks for pointing lots of them out) and I think it is fixed now. This involves another attribute that lines up with master_description_tuples
which is _master_stat_array
. Otherwise we were getting bogus (and not realizing it! see below.)get_prior_stats
because of all the behind the scenes processing we do (namely removing duplicate features). Is this going to work for priors?test_return_indices_and_prior_stats
) because it was giving a typo-like error, but let me know if you think it's still testing what you intended to test.get_prior_statistics
because we only had shape tests.Got it - I will take another look and let you know if anything comes up. Barring that, I think I am fine with the changes - will give you the LGTM then.
Removing the ability to not return a dictionary seems alright to me - we have been using dictionaries for priors up to this point, so it does not impact much. However, I still worry about the order of feature tuples in the inputs of get_prior_stats()
and in turn return_indices()
:
prior_features = stats.get_prior_statistics()
for k,v in prior_features.items():
print(k,v)
print("")
test_features = stats.get_prior_statistics(features=[(2,3,4),(1,2),(1,2,3,4,'cos')])
for k,v in test_features.items():
print(k,v)
(0, 1) {'mean': 2.012418908139269, 'std': 0.6173332929292423}
(1, 2) {'mean': 2.4216726369681227, 'std': 0.8213288285770245}
(2, 3) {'mean': 2.8745282143687696, 'std': 1.3714571997002478}
(3, 4) {'mean': 2.5493349113034016, 'std': 0.47363389711236953}
(0, 2) {'mean': 1.774948456473321, 'std': 0.29663486808813494}
(1, 3) {'mean': 2.4828062612693076, 'std': 1.2339970202914567}
(2, 4) {'mean': 2.933023516839941, 'std': 0.9274496856958656}
(0, 3) {'mean': 2.6964894482936677, 'std': 1.072305730254537}
(1, 4) {'mean': 2.018970301340096, 'std': 0.6946823039715447}
(0, 4) {'mean': 2.057845895845392, 'std': 0.7753098750332134}
(0, 1, 2) {'mean': 2.323987066954144, 'std': 0.34861435568647137}
(1, 2, 3) {'mean': 2.293738041847179, 'std': 0.44558315338102344}
(2, 3, 4) {'mean': 1.8407708108972223, 'std': 0.7627003658227768}
(0, 1, 2, 3, 'cos') {'mean': 0.19688260802053792, 'std': 0.5367364041760957}
(1, 2, 3, 4, 'cos') {'mean': 0.6946537881703886, 'std': 0.5081295092480996}
(0, 1, 2, 3, 'sin') {'mean': 0.16809996805613742, 'std': 0.8030527204553083}
(1, 2, 3, 4, 'sin') {'mean': -0.3380883081097161, 'std': 0.3807319428828161}
(2, 3, 4) {'mean': 2.4216726369681227, 'std': 0.8213288285770245}
(1, 2) {'mean': 1.8407708108972223, 'std': 0.7627003658227768}
(1, 2, 3, 4, 'cos') {'mean': 0.6946537881703886, 'std': 0.5081295092480996}
This output is incorrect, as the first two tuples' statistics are reversed. I think its important to account for these cases - either by forcing the order of input features to these functions to obey the sense of self.order
or just generalize to allow the user to get back correct statistics in whatever weird order they may impose.
LGTM!
Here is my suggestion for a better way of handling
return_indices
andget_prior_statistics
with custom keys. I think it is much better and more efficient. Here is my suggested course of action:return_indices
? I think this would be helpful if you are willing.What do you think?