glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Please provide PDB and AlphaFold xref datasets for 2.8 #1882

Open katewarner opened 1 day ago

katewarner commented 1 day ago

Please provide PDB and AlphaFold xref datasets for 2.8

Here are the selection criteria for PDB Protein Structures from @jeet-vora and Raja, which you can use for filtering the downloaded PDB files.

Rules:

  1. Length - Select the PDB accessions/structure that contains the longest aa sequence.
  2. Method - The structures resolved through the Xray method should be shortlisted first. If Xray structure is not available NMR structures are to be selected.
  3. Resolution - From the shortlisted Xray structure choose the one with the highest resolution. NMR structure does not have a resolution, so select the NMR structure with the longest sequence.
  4. Number of chains - If two structures have identical 1, 2 and 3 properties, then choose the accession with a lower number of chains.

Let me know if you need anymore information.

Example dataset:

"uniprotkb_canonical_ac","sequence_region","pdb_chain","start_pos","end_pos","overlap_ratio","overlap_category","experimental_method","resolution","selection_flag"
"P51610-1","region_1","4GO6","1806","2035","1.0","0.75","X-Ray_Crystallography","2.7","True"
"P51610-1","region_2","4GO6","360","402","1.0","0.75","X-Ray_Crystallography","2.7","True"
jeet-vora commented 1 day ago

@katewarner for whom is this info for?

katewarner commented 16 hours ago

@jeet-vora It's for Jie but I realised while making it need some more info from Robel on how he wants the datasets, so I'll ask him about it on Monday as it's not a priority