Computational-Morphogenomics-Group / MarkerMap

Marker selection, supervised and unsupervised
MIT License
5 stars 1 forks source link

remove Paul_empty since its a copy of Paul #1

Closed WilsonGregory closed 2 years ago

WilsonGregory commented 2 years ago

Changes

How did I test

WilsonGregory commented 2 years ago

I now realize that all the _empty.ipynb are copies of the associated .ipynb files, from this commit: 70a61f40eee256b183f381bb66cf1fd163ae1476

It makes sense to have empty notebooks notebooks because it is "cleaner" from the perspective of other people using the code and wanting to run it for themselves. It also doesn't necessarily make sense to have git version control on the outputs of notebooks which are often going to be different because they rely on non-deterministic processes.

At the same time it makes to have the non-empty notebooks so you can see the results without having to run it yourself.

Having two copies of the same notebook is problematic because it means if I make code changes to one, I would have to make the exact same code changes to the other to keep them in sync.

I am leaning towards only having the empty versions in the github.

beelze-b commented 2 years ago

Great idea!

WilsonGregory commented 2 years ago

Hi @beelze-b, thank you for taking a look at the code! What do you think about the tradeoffs of having empty notebooks vs notebooks with output? I am thinking that just having the empty notebooks might be best. So I am thinking:

  1. Delete the *_empty versions of the notebooks
  2. Writing a pre-commit hook on any changed notebooks to remove that output from notebooks before committing them to a branch. So then you can run notebooks locally to test/get output, but the clean version is maintained in Github.
  3. Running the hook on the existing versions of the notebooks, Paul.ipynb, CITE_seq.ipynb, etc.

What do you think?

beelze-b commented 2 years ago

I agree just having the empty notebooks would be best. I verified that all the data is on the GCloud, so clearing the SummaryTable notebook (or any other notebook) will not cause in any loss of pass results.

I didn't know there were hooks to clear notebooks' outputs. Please go ahead and use them if you wish and it is no effort to you.