Open forrestbao opened 11 months ago
Corner case 1:
The IPynb below is made in VSCode. But once imported into Codepod,
Background: The IPynb counts types of tasks in the Super-NaturalInstruction dataset, a dataset of tasks to test LLMs.
{ "cells": [ { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# load a csv file \n", "df = pd.read_csv('code/SuperNaturalInstructions.csv')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Summary</th>\n", " <th>Category</th>\n", " <th>Domain</th>\n", " <th>Input Language</th>\n", " <th>Output Language</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>task001_quoref_question_generation</td>\n", " <td>Writing questions that require tracking entity...</td>\n", " <td>Question Generation</td>\n", " <td>Wikipedia</td>\n", " <td>English</td>\n", " <td>English</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>task002_quoref_answer_generation</td>\n", " <td>Answering questions that require tracking enti...</td>\n", " <td>Question Answering</td>\n", " <td>Wikipedia</td>\n", " <td>English</td>\n", " <td>English</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>task003_mctaco_question_generation_event_duration</td>\n", " <td>Writing questions that involve commonsense und...</td>\n", " <td>Question Generation</td>\n", " <td>News, Wikipedia, Law, Justice, History, Histor...</td>\n", " <td>English</td>\n", " <td>English</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>task004_mctaco_answer_generation_event_duration</td>\n", " <td>Answering questions that involve commonsense u...</td>\n", " <td>Question Answering</td>\n", " <td>News, Wikipedia, Law, Justice, History, Histor...</td>\n", " <td>English</td>\n", " <td>English</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>task005_mctaco_wrong_answer_generation_event_d...</td>\n", " <td>Writing an implausible answer to the given \"ev...</td>\n", " <td>Wrong Candidate Generation</td>\n", " <td>News, Wikipedia, Law, Justice, History, Histor...</td>\n", " <td>English</td>\n", " <td>English</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name \\\n", "0 task001_quoref_question_generation \n", "1 task002_quoref_answer_generation \n", "2 task003_mctaco_question_generation_event_duration \n", "3 task004_mctaco_answer_generation_event_duration \n", "4 task005_mctaco_wrong_answer_generation_event_d... \n", "\n", " Summary \\\n", "0 Writing questions that require tracking entity... \n", "1 Answering questions that require tracking enti... \n", "2 Writing questions that involve commonsense und... \n", "3 Answering questions that involve commonsense u... \n", "4 Writing an implausible answer to the given \"ev... \n", "\n", " Category \\\n", "0 Question Generation \n", "1 Question Answering \n", "2 Question Generation \n", "3 Question Answering \n", "4 Wrong Candidate Generation \n", "\n", " Domain Input Language \\\n", "0 Wikipedia English \n", "1 Wikipedia English \n", "2 News, Wikipedia, Law, Justice, History, Histor... English \n", "3 News, Wikipedia, Law, Justice, History, Histor... English \n", "4 News, Wikipedia, Law, Justice, History, Histor... English \n", "\n", " Output Language \n", "0 English \n", "1 English \n", "2 English \n", "3 English \n", "4 English " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Translation 394\n", "Question Answering 207\n", "Program Execution 90\n", "Question Generation 83\n", "Sentiment Analysis 75\n", "Text Categorization 46\n", "Text Matching 43\n", "Toxic Language Detection 40\n", "Misc. 37\n", "Cause Effect Classification 37\n", "Information Extraction 34\n", "Textual Entailment 27\n", "Wrong Candidate Generation 27\n", "Named Entity Recognition 25\n", "Commonsense Classification 24\n", "Fill in The Blank 22\n", "Text Completion 21\n", "Sentence Composition 20\n", "Title Generation 19\n", "Summarization 16\n", "Question Understanding 16\n", "Sentence Perturbation 15\n", "Language Identification 15\n", "Coreference Resolution 14\n", "Answerability Classification 14\n", "Dialogue Generation 13\n", "Paraphrasing 12\n", "Text Quality Evaluation 12\n", "Text to Code 12\n", "Question Rewriting 11\n", "Pos Tagging 10\n", "Word Semantics 10\n", "Linguistic Probing 9\n", "Story Composition 9\n", "Speaker Identification 9\n", "Data to Text 9\n", "Word Analogy 8\n", "Negotiation Strategy Detection 7\n", "Gender Classification 7\n", "Stereotype Detection 7\n", "Answer Verification 7\n", "Dialogue Act Recognition 7\n", "Coherence Classification 6\n", "Explanation 6\n", "Ethics Classification 6\n", "Intent Identification 5\n", "Mathematics 5\n", "Sentence Ordering 5\n", "Keyword Tagging 5\n", "Word Relation Classification 5\n", "Text Simplification 4\n", "Code to Text 4\n", "Dialogue State Tracking 4\n", "Stance Detection 3\n", "Section Classification 3\n", "Grammar Error Detection 3\n", "Fact Verification 3\n", "Number Conversion 2\n", "Style Transfer 2\n", "Speaker Relation Classification 2\n", "Grammar Error Correction 2\n", "Overlap Extraction 2\n", "Irony Detection 2\n", "Question Decomposition 2\n", "Discourse Relation Classification 1\n", "Discourse Connective Identification 1\n", "Preposition Prediction 1\n", "Entity Relation Classification 1\n", "Entity Generation 1\n", "Sentence Expansion 1\n", "Paper Review 1\n", "Sentence Compression 1\n", "Spam Classification 1\n", "Spelling Error Detection 1\n", "Punctuation Error Detection 1\n", "Poem Generation 1\n", "Name: Category, dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.set_option(\"display.max_rows\", None)\n", "df['Category'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }
Corner case 2:
The attached Ipynb (filename changed from ipynb to txt) is exported from this Google CoLab notebook.
Once imported to Codepod,
Below is just the bottom part of it.
Corner case 1:
The IPynb below is made in VSCode. But once imported into Codepod,
Background: The IPynb counts types of tasks in the Super-NaturalInstruction dataset, a dataset of tasks to test LLMs.