Iss/reorg - Githubissues

mmuhoberac commented 2 years ago

Main issue solved in this branch is reorganization of the computational utility modules into individual folders each with their own:

MongoEngine document schema (docs/docs.py)
Mongo DB Triggers (triggers/trigger_*.py)
Module utilities (utils/*)

The schema were taken directly from the following schema Cipher_DB_Object_Diagram.pdf

Each folder is called module_(module name) and is located in the base directory. I filled out the mongo engine docs.py schema for all identifiers, properties, cando, and assays. I got the identifies and properties triggers working, as well as some functionality for inserting other identifiers such as models.

There are still a few issues which need to be address (many of which I believe David mentioned previously).

Testing - How do we want to approach unit tests for the project. Since each module is separate (for the most part) in terms of dependencies, I think we should create a testing folder in each module. From here we can create a series of tests that test the document structure, validation methods, triggers, and utilities. For some of the modules, testing will require interacting with the database. For this, we have a separate testing database which can be access for testing purposes. In most of the main methods for the triggers which I created, you can run the trigger with the "--testing" flag and the data will be watched/inputted in the testing collection of the database and we can pull data from there for verification purposes.
Configurations/Dependencies - I think we can make a configuration folder in each module for any module specific configurations (as well as a requirements.txt file). I can update the overall conda environment to only include packages which are used in the project instead of my whole environment.
Pathing - Hopefully I didn't mess up any of the pathing when I moved things around. I have running all triggers in the folder they are located in and then appending the path to the base directory ("../../") when we are using modules that required internal imports. I think there is probably a better solution than this that I am just not aware of.
ID Generation - not the most pressing problem but since each ID needs to be unique (and short enough to use/write down if needed), we need to come up a way to randomly (or systematically) generate these identifiers. This is not an issue now, but for things like assays (where we could get a lot) it could start to get difficult. I have been thinking of using shortuuid to generate these (from a predefined alphabet) and just generate until we get one that is unique for the collection (easy to check). Let me know if a better solution exists.
Mongo Engine Primary Key Fields - So in the past version of the schema I assigned primary key fields to fields that were once just standard fields but used for identification purposes (inchi key, cipher mid, cipher sid, cando id, etc.). These fields can be named anything in the schema but once you denote the field with the primary_key=True argument, it becomes a primary key. This means two things. 1 it is no longer referred to by its assigned name and can only be access via the _id field (or using the pk argument or the with_id() method) and 2 it will be enforced as unique across all documents in the collection. In some cases (mainly with the inchikey) I just added a second field with the previous name (inchikey) to allow for smoother access.
README's - we should probably add README's for each module with configuration details, requirements, etc. in addition to a README for the overall project. This is just to help someone set up a certain module if they are having trouble with that module only.

Let me know if I missed anything here or if anything does not make sense and I can clarify. We can merge this to master if it looks good and keep developing features with this reorganized code structure.

mmuhoberac commented 2 years ago

I will merge this to master today (3/8/2022) unless any of the issues are major issues but keep the pull request open so that we can see the comments and fix all of the other things that were identified in the near future?

I want to make sure that Zack has a working copy on master where he can get CANDO set up for the demo on Thursday.

Does performing the merge affect the ability to access the pull request comments as long as we leave it open? I don't believe it does but I am not sure @davidegraff

davidegraff commented 2 years ago

even if we close the PR we'll still be able to see comments

mmuhoberac commented 2 years ago

Fixed several of the comments made by David, will address more in the next few weeks

chopralab / cipher_db

Iss/reorg #7