BiomedSciAI / fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
Apache License 2.0
134 stars 34 forks source link

Some changes from DTI task (Add doc' and some small features) #233

Closed SagiPolaczek closed 1 year ago

SagiPolaczek commented 1 year ago

Hey!

In this PR I'm suggesting to add some changes I made while working on the drug target interaction (DTI) task - which is still in progress.

Changes:

  1. Added more functionality to some Ops
  2. Additional documentation & small fixes
  3. Small modification to the NDict's ndict. print_tree() function so it will also print the key values (useful for printing nitely hydra's cfg objects!). @egozi

Example for .print_tree()'s output:

BEFORE CHANGES:

>>> ndict = NDict()
>>> ndict["data.input.drug"] = "this_is_a_drug_seq"
>>> ndict["data.input.target"] = "this_is_a_target_seq"
>>> ndict.print_tree()
--- data
------ input
--------- drug
--------- target

AFTER CHANGES:

>>> ndict = NDict()
>>> ndict["data.input.drug"] = "this_is_a_drug_seq"
>>> ndict["data.input.target"] = "this_is_a_target_seq"
>>> 
>>> ndict.print_tree()
--- data
------ input
--------- drug
--------- target
>>> 
>>> ndict.print_tree(print_values=True)
--- data
------ input
--------- drug -> this_is_a_drug_seq
--------- target -> this_is_a_target_seq

Example where it is useful to print NDict's values:

Lets take hydra's cfg object in our pipeline.

We can define the following function:

def print_cfg_tree(cfg):
    """
    beautifully print the config file
    """
    NDict(OmegaConf.to_object(cfg)).print_tree(print_values=True)

and then use it with:

@hydra.main(config_path=".", config_name="config")
def main(cfg: DictConfig):
    cfg = hydra.utils.instantiate(cfg)
    print_cfg_tree(cfg)
    pass

To get the config printed in to the console as (please ignore the values themselves):

--- paths
------ root_dir -> ./_examples/dti
------ data_dir -> /dccstor/mm_hcls/usr/sagi/_data/BindingDB/
------ model_dir -> ./_examples/dti/mlp
------ cache_dir -> ./_examples/dti/cache_dir
------ infer_dir -> ./_examples/dti/mlp/infer_dir
------ eval_dir -> ./_examples/dti/mlp/eval_dir
------ ckpt_filename -> best_epoch.ckpt
------ infer_filename -> infer_file.gz
--- train
------ model -> mlp
------ num_workers -> 8
------ batch_size -> 32
------ lr -> 1e-05
------ weight_decay -> 0.001
------ num_epochs -> 1
------ accelerator -> gpu
------ num_devices -> 1
------ model_params
--------- mlp -> None
--------- cat -> None
--------- cat_custom -> None
--- data
------ drug_fixed_size -> 80
------ target_fixed_size -> 1000
--- logging
------ log_clear_ml -> 0
------ task_name -> mlp-

I used it just before moving to ClearML (which saves the hyperparams) and still using it in times I don't want to log into ClearML. Other different use cases might be found.