Closed kpgbrock closed 6 years ago
The distinction between id and i in general is intentional - the former is a residue identifier, the latter is a position in a sequence.
DSSP-extracted secondary structure is an example of the first (even though in our setting id == i since the structures are remapped into actual sequence indices using SIFTS). Secondary structure predictions are always an example of the second, which is why the CSV has i as column header.
The way the plotting function is currently implemented is so one can simply plonk the residues dataframe associated with a PDB structure object into the plotting function, even though the plotting function assumes that the identifiers are actually sequence indices.
One could argue that the function should expect a dataframe with column i instead of id to explicitly force the user to think about the conversion by renaming the column, but then that feels like a hassle to me because you have to rename every time you plot an experimental structure. Alternatively one could look for existence of i or id.
So not a bug, and I would just leave design as is for now since it doesn't have any real consequences for users.
Hi @thomashopf ,
fair enough, a note somewhere of this is though worth. I guess that if someone was to look at the issues an finds this, it's good enough. I'll also upload @kpgbrock 's notebook to the notebooks
folder in a PR soon
In this function ( https://github.com/debbiemarkslab/EVcouplings/blob/master/evcouplings/visualize/pairs.py#L752 ), a column with
id
needs to be defined, but in sec_struct.csv the column's header is justi
. Maybe inconsistent?