NVIDIA-Genomics-Research / AtacWorks

Deep learning based processing of Atac-seq data
https://clara-parabricks.github.io/AtacWorks/
Other
128 stars 23 forks source link

Motif model #103

Closed avantikalal closed 4 years ago

avantikalal commented 4 years ago

Extra inputs AtacWorks can now take as input any number of additional bigWig files ('layers') aside from the noisy ATAC-seq data. All of these will be concatenated as additional channels in the model input ('input'). The number of input tracks is supplied as an argument ('in_channels') to main.py.

h5 files now contain multiple datasets ('input' for inputs, 'label_reg' and 'label_cla' for labels) instead of combining everything into one dataset called 'data'.

Closes #13 .

Always using --nolabel for inference

I deleted some lines in DatasetInfer (dataset.py) because I don't understand their function. As far as I can see we never need DatasetInfer to supply labels - only the input.

Deleted unused DatasetEval.

run.sh included two commands to encode inference data, with and without labels. Since we never need labels to be encoded in the .h5 file for inference, I removed the command for encoding with labels.

ntadimeti commented 4 years ago

Additional things we discussed to be added to this PR :

  1. Add --layersbw usecase to example/run.sh
  2. Updating example 7b, and calculate_baseline_metrics.py in example/run.sh
  3. Remove test_data.h5 and use no_label.h5 for inference and bigwig files otherwise.

Additionally, once PR#97 is merged, please run --> flake8 --ignore=E901 --> pydocstyle --convention=google

to ensure linting and documentation are correctly formatted.