Open AlasdairGray opened 2 years ago
I've got a first draft of this query working which identifies 663 proteins in the IDP-KG that are in MobiDB or PED but are not in DisProt.
@ivanmicetic what information would be useful to return? Below are some results from the query. If you look at rows 9 and 66, are the UniProt IDs from PED useful here (some entries will have more than one) or would it be more useful to return the PED ID?
Well, we are interested in proteins which are present or excluded among datasets. Therefore, I would prefer UniProt IDs instead of internal resource IDs (DisProt/PED/MobiDB (which are the same as UniProt))
Which proteins have predicted disordered regions in MobiDB, but are not yet annotated manually in DisProt. The same for PED.
Basically an intersection and union between the resources in order to pinpoint which proteins are missing annotation in DisProt.