calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
411 stars 126 forks source link

Expose 'raw' argument #175

Closed Yaoyx closed 1 year ago

Yaoyx commented 1 year ago

Description of your changes

I exposed 'raw' argument to CLI, so user can check whether the input data is already one-hot-encoded, which makes the explore_model.ipynb data loading consistent with the input shape and format required for a model prediction.

Issue ticket number and link

173

Type of change

(If applicable) How has this been tested?

davek44 commented 1 year ago

I still can't figure out the problem you're trying to solve here. explore_model.ipynb runs fine for me. Hopefully this solves your problem, but I don't see the general need to add this option to basenji_train.py. So I'm going to leave this PR here for now.

Yaoyx commented 1 year ago

Hi @davek44,

I see. My problem was that the output of basenji_data_write is in index format, which has a shape of [N, W, 1]), but the model requires one hot format input that has a shape of [N, W, 4] #173. In dataset.py, 'raw' is set to True for generate_parser function https://github.com/calico/basenji/blob/615b9eec8a591783b16d959029ddad08edae853d/basenji/dataset.py#L215, which skips the one hot encoding here https://github.com/calico/basenji/blob/615b9eec8a591783b16d959029ddad08edae853d/basenji/dataset.py#L89-L93. Thus, I was thinking if 'raw' should be False here, or exposing the 'raw' parameter to user, so we can decide it.

Regards, Yao