JupyPod / Codepod

RunVas IDE: Scalable Interactive Coding
https://codepod.io
MIT License
0 stars 1 forks source link

Reliable exporting and importing for backup and restore #1

Open forrestbao opened 11 months ago

forrestbao commented 11 months ago

Corner case 1:

The IPynb below is made in VSCode. But once imported into Codepod,

  1. all source code is gone.
  2. HTML is not properly rendered.

image

Background: The IPynb counts types of tasks in the Super-NaturalInstruction dataset, a dataset of tasks to test LLMs.

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load a csv file \n",
    "df = pd.read_csv('code/SuperNaturalInstructions.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Summary</th>\n",
       "      <th>Category</th>\n",
       "      <th>Domain</th>\n",
       "      <th>Input Language</th>\n",
       "      <th>Output Language</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>task001_quoref_question_generation</td>\n",
       "      <td>Writing questions that require tracking entity...</td>\n",
       "      <td>Question Generation</td>\n",
       "      <td>Wikipedia</td>\n",
       "      <td>English</td>\n",
       "      <td>English</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>task002_quoref_answer_generation</td>\n",
       "      <td>Answering questions that require tracking enti...</td>\n",
       "      <td>Question Answering</td>\n",
       "      <td>Wikipedia</td>\n",
       "      <td>English</td>\n",
       "      <td>English</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>task003_mctaco_question_generation_event_duration</td>\n",
       "      <td>Writing questions that involve commonsense und...</td>\n",
       "      <td>Question Generation</td>\n",
       "      <td>News, Wikipedia, Law, Justice, History, Histor...</td>\n",
       "      <td>English</td>\n",
       "      <td>English</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>task004_mctaco_answer_generation_event_duration</td>\n",
       "      <td>Answering questions that involve commonsense u...</td>\n",
       "      <td>Question Answering</td>\n",
       "      <td>News, Wikipedia, Law, Justice, History, Histor...</td>\n",
       "      <td>English</td>\n",
       "      <td>English</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>task005_mctaco_wrong_answer_generation_event_d...</td>\n",
       "      <td>Writing an implausible answer to the given \"ev...</td>\n",
       "      <td>Wrong Candidate Generation</td>\n",
       "      <td>News, Wikipedia, Law, Justice, History, Histor...</td>\n",
       "      <td>English</td>\n",
       "      <td>English</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                Name  \\\n",
       "0                 task001_quoref_question_generation   \n",
       "1                   task002_quoref_answer_generation   \n",
       "2  task003_mctaco_question_generation_event_duration   \n",
       "3    task004_mctaco_answer_generation_event_duration   \n",
       "4  task005_mctaco_wrong_answer_generation_event_d...   \n",
       "\n",
       "                                             Summary  \\\n",
       "0  Writing questions that require tracking entity...   \n",
       "1  Answering questions that require tracking enti...   \n",
       "2  Writing questions that involve commonsense und...   \n",
       "3  Answering questions that involve commonsense u...   \n",
       "4  Writing an implausible answer to the given \"ev...   \n",
       "\n",
       "                     Category  \\\n",
       "0         Question Generation   \n",
       "1          Question Answering   \n",
       "2         Question Generation   \n",
       "3          Question Answering   \n",
       "4  Wrong Candidate Generation   \n",
       "\n",
       "                                              Domain Input Language  \\\n",
       "0                                          Wikipedia        English   \n",
       "1                                          Wikipedia        English   \n",
       "2  News, Wikipedia, Law, Justice, History, Histor...        English   \n",
       "3  News, Wikipedia, Law, Justice, History, Histor...        English   \n",
       "4  News, Wikipedia, Law, Justice, History, Histor...        English   \n",
       "\n",
       "  Output Language  \n",
       "0         English  \n",
       "1         English  \n",
       "2         English  \n",
       "3         English  \n",
       "4         English  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Translation                            394\n",
       "Question Answering                     207\n",
       "Program Execution                       90\n",
       "Question Generation                     83\n",
       "Sentiment Analysis                      75\n",
       "Text Categorization                     46\n",
       "Text Matching                           43\n",
       "Toxic Language Detection                40\n",
       "Misc.                                   37\n",
       "Cause Effect Classification             37\n",
       "Information Extraction                  34\n",
       "Textual Entailment                      27\n",
       "Wrong Candidate Generation              27\n",
       "Named Entity Recognition                25\n",
       "Commonsense Classification              24\n",
       "Fill in The Blank                       22\n",
       "Text Completion                         21\n",
       "Sentence Composition                    20\n",
       "Title Generation                        19\n",
       "Summarization                           16\n",
       "Question Understanding                  16\n",
       "Sentence Perturbation                   15\n",
       "Language Identification                 15\n",
       "Coreference Resolution                  14\n",
       "Answerability Classification            14\n",
       "Dialogue Generation                     13\n",
       "Paraphrasing                            12\n",
       "Text Quality Evaluation                 12\n",
       "Text to Code                            12\n",
       "Question Rewriting                      11\n",
       "Pos Tagging                             10\n",
       "Word Semantics                          10\n",
       "Linguistic Probing                       9\n",
       "Story Composition                        9\n",
       "Speaker Identification                   9\n",
       "Data to Text                             9\n",
       "Word Analogy                             8\n",
       "Negotiation Strategy Detection           7\n",
       "Gender Classification                    7\n",
       "Stereotype Detection                     7\n",
       "Answer Verification                      7\n",
       "Dialogue Act Recognition                 7\n",
       "Coherence Classification                 6\n",
       "Explanation                              6\n",
       "Ethics Classification                    6\n",
       "Intent Identification                    5\n",
       "Mathematics                              5\n",
       "Sentence Ordering                        5\n",
       "Keyword Tagging                          5\n",
       "Word Relation Classification             5\n",
       "Text Simplification                      4\n",
       "Code to Text                             4\n",
       "Dialogue State Tracking                  4\n",
       "Stance Detection                         3\n",
       "Section Classification                   3\n",
       "Grammar Error Detection                  3\n",
       "Fact Verification                        3\n",
       "Number Conversion                        2\n",
       "Style Transfer                           2\n",
       "Speaker Relation Classification          2\n",
       "Grammar Error Correction                 2\n",
       "Overlap Extraction                       2\n",
       "Irony Detection                          2\n",
       "Question Decomposition                   2\n",
       "Discourse Relation Classification        1\n",
       "Discourse Connective Identification      1\n",
       "Preposition Prediction                   1\n",
       "Entity Relation Classification           1\n",
       "Entity Generation                        1\n",
       "Sentence Expansion                       1\n",
       "Paper Review                             1\n",
       "Sentence Compression                     1\n",
       "Spam Classification                      1\n",
       "Spelling Error Detection                 1\n",
       "Punctuation Error Detection              1\n",
       "Poem Generation                          1\n",
       "Name: Category, dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.set_option(\"display.max_rows\", None)\n",
    "df['Category'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
forrestbao commented 11 months ago

Corner case 2:

The attached Ipynb (filename changed from ipynb to txt) is exported from this Google CoLab notebook.

Once imported to Codepod,

  1. The order of cells are wrong sometimes.
  2. Some cells overlap with each other.
  3. Some cells are imported more than once.

Below is just the bottom part of it. image