airbnb / knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Apache License 2.0
5.48k stars 688 forks source link

Can't skip cells #425

Open CPapadim opened 6 years ago

CPapadim commented 6 years ago

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj As suggested by Issue #352 , setting the Slide Show slide type to Skip should entirely skip cells in posts from publishing.

I see that's happening in the ipynb.py converter when I look at the code. However, when I post notebooks with cells that have those settings, the cells still show up in the post. Is there something special I have to do to apply that converter to my notebook?

Here's my notebook if it's helpful:

 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "title: Cell Test!\n",
    "authors:\n",
    "- Author1\n",
    "tags:\n",
    "- knowledge\n",
    "- example\n",
    "created_at: 2016-06-29\n",
    "updated_at: 2016-06-30\n",
    "tldr: This is short description of the content and findings of the post.\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Show me!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Skipp me!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": false,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
CPapadim commented 6 years ago

Upon further inspection, it looks like the 'Skip' setting is to only skip the 'Input' part of the cell (code) but not the output.

I'm going to implement something (probably a tag, or maybe the 'Notes' selection from the slide type) to skip entire cells. Is this something that you'd be interested in having merged back here? If so, are there any preferences as to the specific implementation?

matthewwardrop commented 6 years ago

Hi @CPapadim ! Thanks for reaching out here (and as noted other correspondence, I apologise for the delay). The existing skip functionality is from a legacy pre-code-folding in the Knowledge Repo, and was a way to prevent large amounts of code from bamboozling unsuspecting readers who were only interested in the output of that code. I think having some more nuanced hints to the Knowledge Repo about what should be shown would be a great idea. In particular, I think a way to indicate that certain cells were there as "debug" or "detailed notes" cells, and then to hide this content by default from the UI (while retaining the ability to specify that you did actually want this content shown) would be a great idea.

And yes, please do share anything you come up with back here. We'd love to know how people are using the Knowledge Repo, and to have the Knowledge Repo share in the improvements made by others.

CPapadim commented 6 years ago

I ended up changing the template so that by default neither code nor outputs show up. Then, selecting 'Slide' shows both code and output, and 'Sub Slide' shows only output.

The idea is that I've been using Knowledge Repo as a reporting tool for converting notebooks to business reports, and found that many cells in Jupyter notebooks are exploratory and shouldn't show up. For those we want to include in reports, it's easy to either include just the output, or both output and code by selecting from the aforementioned dropdown.

I can merge this template back if you're interested, but I'm not sure if it's consistent with Knowledge Repo's philosophy which seems to instead be more targeted at sharing work with technical users.

I've also written a separate script containing a bit of functionality that will publish a notebook directly from github rather than from the local machine, and will also add a cell to the end of a KR post that contains a direct link to the git commit and notebook on github from which the knowledge post was generated - this way technical users can always access the exact code generating any post for reproducibility. If you're interested in adding this functionality to the knowledge_repo script's publishing functions I can also share it, though it will likely need some work to improve the code quality of my proof-of-concept implementation.

Essentially though, my concept for using KR is a way to generate business reports that are tied directly to code for reproducibility and shareability among data scientists - and so it diverges in concept from what I think KR is intended to be. As long as that remains the case, I'm not sure that it makes sense to merge any of the changes relating to this issue back here.

Probably, the best approach would be to add a flag to knowledge_repo script to specify the template to use for post generation. If that's an idea that makes sense to you I can code it up. Let me know.