NVIDIA-Genomics-Research / AtacWorks

Deep learning based processing of Atac-seq data
https://clara-parabricks.github.io/AtacWorks/
Other
128 stars 23 forks source link

WIP: Remove individual scripts and modularize the pre-processing steps #181

Closed ntadimeti closed 4 years ago

ntadimeti commented 4 years ago

This PR is very big just from the nature of the change. 1) get_intervals.py, bw2h5.py, peak2bw.py files are now modules. All of them are called inside main.py. 2) Due to the change above, a bunch of new options are exposed through config files and some options that are no longer relevant are removed (files_train, val_files, etc). 3) Some config options like print_freq, eval_freq, debug, transform, clip_grad etc were considered unimportant and removed after brainstorming with @avantikalal 4) --sizes_file is renamed as --genome. It can now take in strings "hg19" or "hg38" for picking up the sizes files automatically. Optionally one can also provide a path to a sizes file. 5) --regions option is added. This is only relevant for denoising, not for training. Using this, one can choose a subset of chromosomes to run denoising on. See help for how it works. 6) All tests are updated to reflect this API. expected_results folder files are updated to reflect the new output folder structure. 7) --label is renamed as --exp_name to reflect that it's the experiment name. 8) out_home, exp_name, gpu, distributed flags are now mandatory. Meaning, users MUST provide these, there are no default values in config. @avantikalal @tijyojwad -- I suggest you to take a look at this PR as soon as possible, as I suspect this to take some time to iterate on and merge

ntadimeti commented 4 years ago

rerun tests

ntadimeti commented 4 years ago

Closing this PR since it has become too big to track the changes. Will open several smaller PRs which contain subset of these changes for easy review and merge