Closed sroet closed 3 years ago
Alright @dwhswenson this one is ready for a review, please let me know if this is what you intended by #89
@dwhswenson A friendly ping to make sure this is still on your radar :wink:
Merging #94 (646f10b) into master (c9d1451) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #94 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 13 13
Lines 1066 1125 +59
=========================================
+ Hits 1066 1125 +59
Impacted Files | Coverage Δ | |
---|---|---|
contact_map/contact_count.py | 100.00% <100.00%> (ø) |
|
contact_map/contact_map.py | 100.00% <100.00%> (ø) |
|
contact_map/plot_utils.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c9d1451...646f10b. Read the comment docs.
@dwhswenson , friendly monthly ping to make sure it is still on a list somewhere :)
@dwhswenson Just FYI; this PR needs to be squashed before merging, as it contains multiple commits altering all the notebooks, which will lead to repository bloat.
Also, I still don't agree with the codeclimate failure
@sroet : Could you resolve conflicts on this branch? I'm hoping to try reviewing this soon. Since this has a lot of notebook changes, I want to hold off on notebook updates in #90 and #110 until this is in, to limit conflicts.
@sroet : Could you resolve conflicts on this branch? I'm hoping to try reviewing this soon. Since this has a lot of notebook changes, I want to hold off on notebook updates in #90 and #110 until this is in, to limit conflicts.
All done :)
Also, I still don't agree with the codeclimate failure
Ah, now I remember these. (Old link had stopped working) Yes, I disagree as well. It's clear that they add "mass" to their code similarity measure when you use kwargs. That is nonsense. When these first popped up, I actually started going into their codebase to try to figure out how I might fix it, but between the fact that most of the code is CodeClimate extensions to a 3rd party code, plus the fact that it is mostly written in Ruby, I gave up.
Before I get into the code, a question about the overall approach:
If I understand correctly, the x vs y axis can be either query or haystack depending on the number of residues. I think that may cause confusion for the users. Isn't it easier to just say that default x-axis is the query and default y-axis is the haystack? This seems like it would be more consistent (and, in any case, give identical results for users who follow our recommendation to keep the query smaller than the haystack.)
If I understand correctly, the x vs y axis can be either query or haystack depending on the number of residues. I think that may cause confusion for the users. Isn't it easier to just say that default x-axis is the query and default y-axis is the haystack? This seems like it would be more consistent (and, in any case, give identical results for users who follow our recommendation to keep the query smaller than the haystack.)
Not really, that is just the ContactCount
default (which has no idea of the existence of queries or haystacks).
For all classes that use ContactsDict.atom_contacts
(or residue_contacts
) we explicitly set n_x
to query_range
and n_y
to haystack_range
.
If we make a ContactCount
from somewhere else (like in ContactDifference
), then we use the data inside the counter to get the best guess (these might not even be complete queries
or haystacks
if there are certain contacts missing in the counter
s)
@dwhswenson Did the refactor (moved relevant code to plot_utils
), and added the plot to the "changing the defaults" ipynb please have another look
Thought: Should ContactPlotRange be public API?
I have no preference on this, so I will make it private.
LGTM. Squash-merging as suggested, given many changes to notebooks (almost forgot to do that!)
Should close #89
[non-public API-break] This code breaks some non-public API, see Other Changes below
Description of the new behaviour: For all
ContactMaps
, exceptContactDifference
(and its subclasses): It willcontacts.plot()
will plotquery
/query_residues
on thex
-axis andhaystack
/haystack_residues
on they
-axis.For
ContactDifference
thequery
andhaystack
and the_residue
equivalents might not make sense so instead they use the new default, which tries to plot all points with as little amount of "double" (due to symmetry) points as it can.The new n_x, n_y defaults
We construct the lower triangle of our data by sorting the keys (indices) and then we look at
x
andy
dimensions of the smallest rectangle required to cover all the datapoints in this triangle. Effectively trying to see if there is a set oflow
and a set ofhigh
indices that can be used to describe all data points (like aquery
and ahaystack
). We then take the shortest of the 2 dimensions asn_x
and the longest asn_y
.Other changes
sparse_matrix
andpandas.DataFrame
now are guaranteed to be square,max_size
bymax_size
(being effectively equal to the old implementation for all classes.plot
warning now takes into acount the actual range of data being plottedhaystack_residues_indices
andquery_residues_indices
are now calculated and stored at the initialization._range_from_object_list()
was not needed anymore and removed.query_range
andhaystack_range
like their*_residue_range
equivalents.Plotting everything again
The following code can be used to plot all points of the contact map:
Some other info that was here before the last edit:
Things I like some feedback on (preferably from @dwhswenson ): This was offered as a possibility for the user:
~What should
total_range
be? The total range we have data (min/max on thecounter
)? or0
totopology.n_atoms
/topology.n_residues
(this would require some more code as topology is not something the ContactCounts have)?~Nvm, we need the size anyway for casting to matrices in a sensible way, so I added it as a keyword.
~Contains similar code as #95, but that can be split out if this is ready to be merged before that one is in~
Current issues to solve: