New flags for the base Dataset class random_rotate=True and center=True. These are transforms applied by default to each protein that apply a random rotation and translate a protein to its center of mass. The random seed for the rotation is always hash(protein['protein']['sequence']) % 2**28
New flag for base Dataset class, split_chains=False. When True each protein chain occupies a single item in the dataset. That is, if we had ['AB', 'C'] as proteins, when the chains are split we will have ['A', 'B', 'C'].
PPI dataset uses all these flags as True when constructing.
The ProteinProteinInterfaceDataset now has a new metadata ProteinProteinInterfacedataset._interfaces which is a dictionary storing inter-chain contacts as pairs of sequence indices. _interfaces[<PDBID>][<chain_1>][<chain_2>] --> list[tuple] where each tuple looks like (index_chain_1, index_chain_2) giving the position in the sequence of each pair of residue within 6 Anstroms and in different chains.
The PPI task becomes a pairwise task so the target() method accepts two proteins and returns a binary matrix C of shape N_residues_protein_1 x N_residues_protein_2 so that C[i][j] --> 1 if residue i of protein_1 is in contact with residue j of protein_2 and 0 otherwise.
Dataset
classrandom_rotate=True
andcenter=True
. These are transforms applied by default to each protein that apply a random rotation and translate a protein to its center of mass. The random seed for the rotation is alwayshash(protein['protein']['sequence']) % 2**28
Dataset
class,split_chains=False
. WhenTrue
each protein chain occupies a single item in the dataset. That is, if we had ['AB', 'C'] as proteins, when the chains are split we will have ['A', 'B', 'C'].True
when constructing.ProteinProteinInterfaceDataset
now has a new metadataProteinProteinInterfacedataset._interfaces
which is a dictionary storing inter-chain contacts as pairs of sequence indices._interfaces[<PDBID>][<chain_1>][<chain_2>] --> list[tuple]
where each tuple looks like(index_chain_1, index_chain_2)
giving the position in the sequence of each pair of residue within 6 Anstroms and in different chains.target()
method accepts two proteins and returns a binary matrixC
of shapeN_residues_protein_1 x N_residues_protein_2
so thatC[i][j] --> 1
if residuei
ofprotein_1
is in contact with residuej
ofprotein_2
and 0 otherwise.