CATH-summer-2017 / domchop

2 stars 3 forks source link

Steps towards a Python developer on Github. #3

Open shouldsee opened 7 years ago

shouldsee commented 7 years ago

Documents for reference:

Frequently, one aim to implement some functionality via a seemingly promising route, only to realise the pitfall a while later. Thus it'd be best if we could discuss these potential pitfalls regularly so as to develop good habits and structures.

Practically, this require collaborators to describe the functionality they have in mind, and the problems encountered during attempts. You can do this by writing a Markdown document on the "Issues" section. Since we will be meeting frequently, there is a good chance that we could complement each other and document the issues into a maximally useful form.

For example, I had some hard time with importing Python modules. At start of any coding, it is hard to imagine how the code should be written. Usually the easier task is to predict the behaviour of some given chunk of code. By understanding the pitfalls of imperfect codes, one can then grab the gist of some abstract structure. I will try to write up some tests soon.

I hope you find this helpful! @miruuna @ilsenatorov

shouldsee commented 7 years ago

I am happy to tell you the functionality of domutil.util.get_nDOPE() is pretty stable now. The script is currently stored under tests/test_pdb.py. You can supply a pdb file or a directory containing pdb file and it will return the output in "ref_nDOPEs.csv" under current directory.

After set up you modeller, In domchop/, do:

python tests/test_pdb.py -i pdbs

We should migrate test_pdb.py into a documented script in the future, and form the test_code as a separate python file. We define the test as transforming a input directory (of pdbs) into a pre-documented ".csv" file. A sub-test would be transform a single structure into a pre-calculated value. One is noted that the calculation is dependent on some external datasets from modeller.

To-do:

TL;DR:

  1. Optimise and adapt to the CATH data structure and create reusable tools for visualisation
  2. Write statistical tests suites.

One of the crucial skill is to visualise data in a human readable format. In terms of the nDOPE module of Domutil, it comes down to linking different resources (picture, ID, structures) into a HTML portal. As suggested by Ian, one of the application of nDOPE is to detect anomaly member within a superfamily. i.e.:where most structures are well pack except some outliers. This will serve to sanitise the data.

The other application of nDOPE score is to automatically flag structure as "unpacked" and thus alleviate investing human resources on its annotation. Another potential tag is "MCD", which could be inferred from significant difference chain-based nDOPE and assembly-based nDOPE. As @sillitoe metioned, we could use SCOP MCD tag to group the data and compare the nDOPE-distribution of the MCD and non-MCD groups.

Sidenote: MVC (model, view, controller) in web development. Django, Flask, Pyramid are candidates for web development.