cssmagic / Learn-AI-Assisted-Python-Programming

📖 《AI 辅助编程》这本书的大本营。
137 stars 13 forks source link

[译] [205] 我们的第一个编程练习 #35

Open cssmagic opened 6 months ago

cssmagic commented 6 months ago

2.5 Our first programming problem

2.5 我们的第一个编程练习

The goal of this next section is twofold: 1) for you to see the workflow of interacting with Copilot and 2) for you to gain an appreciation of how powerful Copilot can be by seeing it solve a fairly complicated task fairly easily.

本节有两个主要目标:(1) 展现与 Copilot 交互的整个工作流程;(2) 通过观察 Copilot 如何轻松解决一个相当复杂的任务,让你感受到 Copilot 的强大能力。

In our next chapter, we’ll talk through the workflow with Copilot in more detail, but generally speaking, you’ll use the following steps when authoring code with Copilot:

在下一章中,我们将更深入地探讨与 Copilot 协作的工作流。但一般来说,当你借助 Copilot 编写代码时,大体遵从以下几步:

  1. Write a prompt to Copilot using comments (#) or docstrings (""").
  2. Let Copilot generate code for you.
  3. Check to see if the code is correct, by reading through it and by testing
    1. If it works, move to Step #1 for the next thing you’d like it to do.
    2. If it doesn’t work, delete the code from Copilot and go back to Step #1 and modify the prompt (and see suggestions in table 2.1).
  1. 通过注释(#)或文档字符串(""")向 Copilot 提供提示词。
  2. 让 Copilot 自动生成代码。
  3. 通过阅读和测试代码来确认其是否正确
    1. 如果代码运行正常,继续执行步骤 #1,进行下一个任务。
    2. 如果代码运行不正常,删除 Copilot 生成的代码,返回步骤 #1 并修改提示词(参见表 2.1 中的建议)。

Because you’ve just started working with Copilot, we’re wary of showing you such a large example, but we feel you’ll really value seeing how powerful Copilot can be now that you have it installed. As such, we want you to follow along as best as you can to get a feel for working with Copilot on your own, but if you get stuck, feel free to just read along and save working along with Copilot in VSCode for the next chapter. Later chapters will explain the process of working with Copilot in more detail.

由于你刚开始尝试 Copilot,我们对展示一个这么大的例子感到谨慎,但我们认为,让你见证 Copilot 的强大能力,特别是在你已经安装了它之后,将极具价值。因此,我们希望你尽可能地跟着操作,以获得使用 Copilot 的实践体验,但如果你遇到难题,也可以选择仅仅阅读,把实践操作留到下一章进行。后续章节将会详细解释如何与 Copilot 合作的过程。

Also, Copilot will generate a lot of code in this section and we don’t expect you to understand the code until much later in the book. We provide the code solely so you can see what Copilot gave us, but do NOT feel as though you need to try to understand the code in this chapter.

此外,本部分 Copilot 将生成许多代码,我们并不期待你在本书后面的内容中立刻理解这些代码。我们提供这些代码,仅仅是为了让你看到 Copilot 生成的内容,但不要认为你需要在本章理解这些代码。

To get started, let’s create a new file. If you aren’t already in VSCode, go ahead and start it. Then create a new Python file and save it as nfl_stats.

开始之前,请创建一个新文件。如果你目前还没有使用 VSCode,请先启动它。然后创建一个新的 Python 文件,并将其命名为 nfl_stats

Showcasing Copilot’s value in a data processing Task

展现 Copilot 在数据处理任务上的能力

We want to start with some basic data processing as this is something that many of you have likely done in your personal or professional lives. To find a dataset, we went to a great website called Kaggle [4] which has tons of datasets freely available for use. Many of them include important data like health statistics for different countries, or information to help track the spread of disease, and so on. We’re not going to use those because we’d like to have something lighter for our first program. Since both of us are American Football fans, we felt we should play with the NFL offensive stats database.

我们决定从一些基础的数据处理任务开始,这可能是许多人在个人或职业生活中已经尝试过的。在寻找合适的数据集时,我们使用了一个名为 Kaggle [4] 的优秀网站,该网站免费提供了丰富的数据集。这些数据集涵盖了众多领域,比如不同国家的健康统计数据或是疾病传播的追踪信息等。但我们选择不使用这类数据,因为我们希望我们的第一个程序项目更加轻松有趣。由于我们都是橄榄球的热忱粉丝,我们决定挑选 NFL 进攻数据统计库来进行我们的数据处理实践。

Let’s get started by downloading the dataset from:

我们先从以下链接下载数据集:

www.kaggle.com/datasets/dtrade84/nfl-offensive-stats-2019-2022

To download the dataset, you will have to sign up for a Kaggle account. If you don’t want to create the account, it’s okay to just read through this section without using VSCode and Copilot to generate the code yourself. Once downloaded, you may need to extract the zip file using the default zip extractor on your computer. Copy the dataset file from that zip file into your current folder in VSCode where you have your code (the folder you have open in Explorer). (If you are on a Mac and the file is saved as a .numbers file, you will need to use “File->Export To” to then save the file as a CSV in your current working directory.) That dataset has NFL information from 2019 to 2022.

为了下载数据集,你将需要注册一个 Kaggle 账户。如果你不希望注册账户,只是阅读这一部分而不亲手使用 VSCode 和 Copilot 生成代码也是可以的。下载完成后,你可能需要使用电脑自带的压缩文件解压缩工具来解压 zip 文件。从该 zip 文件中将数据集文件复制到你的 VSCode 当前文件夹,即你在资源管理器中打开的那个包含你代码的文件夹。(如果你是 Mac 用户,并且文件保存为 .numbers 格式,你将需要使用“文件->导出至”然后将文件作为 CSV 保存在你当前的工作目录。)该数据集含有 2019 到 2022 年的 NFL 信息。

Figure 2.4 The first few columns and rows of the nfl_offensive_stats.csv dataset.

图 2.4 nfl_offensive_stats.csv 数据集的前几行和前几列。

The nfl_offensive_stats.csv file is something known as a comma separated value text file (see figure 2.4 for a portion of the file). This is a pretty standard format for storing data. It has a header row at the top that explains what’s in every column. The way that we (or a computer) know the boundaries between columns is to use the commas between cells. Also notice that each row is placed on its own line.

nfl_offensive_stats.csv 文件是一种被称为逗号分隔值文本文件的东西(参见图 2.4 查看文件的一部分)。这是一种用于存储数据的非常标准的格式。它顶部有一个标题行,解释了每一列中的内容。我们(或计算机)知道列之间边界的方式是通过单元格之间的逗号。同时注意,每一行都放置在自己的行上。

Good news: Python has a bunch of tools for reading in CSV files.

好消息是 Python 有一大堆工具可以读取 CSV 文件。

Step 1: How many passing yards did Aaron Rodgers throw in 2019-2022

第 1 步:2019 至 2022 年,阿隆·罗杰斯共投掷了多少传球码数

Let’s start by exploring what is stored in this file. To preview what is in the file, you can look at the Kaggle webpage for these stats under “Detail”, you can open it in VSCode, or you can just open it up in spreadsheet software like Microsoft Excel. (If you open it with Excel, be sure not to save the file. We need to leave the file in a .csv format.) Whichever way you choose to open it, here’s the start of the header (top) row (also shown in figure 2.4):

我们首先探查这个文件包含了哪些数据。为了预览文件内容,你既可以在 Kaggle 网页上查看这些统计信息的“详细信息”,也可以在 VS Code 中打开它,或者直接在 Microsoft Excel 等电子表格软件中查看。(如果使用 Excel 打开,请务必不要保存文件,因为我们需要维持文件的 .csv 格式。)不论你采用何种方式打开它,这是表头(最顶端一行)的起始部分(也在图 2.4 中展示):

game_id,player_id,position ,player,team,pass_cmp,pass_att,pass_yds,...

There are a bunch more columns after these, but these have all we need to perform our first task. We know now that there’s a column for players and a column for passing yards. Aaron Rodgers is a player that gets passing yards in each game that he plays. But how many passing yards does he have in total, over all of the games that he played? This isn’t so easy to answer by directly looking at the file. So what we want the computer to do is to make this easier for us!

在这些之后,表中还有很多其他列,但这些信息已经足以让我们完成首个任务。我们现在知道,表中有一列是为了记录球员姓名,另一列记录的是传球码数。阿隆·罗杰斯在他参加的每一场比赛中都会有传球码数的增加。但是,在他参与的所有比赛中,他总共获得了多少传球码?直接从文件中找出答案并不简单。因此,我们需要计算机帮助我们简化这个问题的解答过程!

We want it to sum up all the passing yards (pass_yds) for rows (games) where Aaron Rodgers is the player. For now, we’re going to just ask for all the yards in the database even though it covers multiple seasons. We could change this later if we’d like. This problem might be a good problem to give to programmers learning to program in their 4th week of a standard college-level introductory programming course, but we have Copilot! So instead of learning how to write this code from scratch, we’re just going to ask Copilot to generate it for us. To make that happen, we are going to be quite specific in our request to make sure Copilot knows what we are asking for.

我们期望它能统计所有行(即比赛)里,标记为阿隆·罗杰斯这名球员的传球码数(pass_yds)的总和。目前,我们打算查询数据库中记录的所有传球码数,尽管这涉及多个赛季。未来如果我们想要,还可以对此进行调整。对于正处于标准大学水平入门编程课程第四周的编程学习者来说,这可能是一个合适的问题,但我们有 Copilot!因此,我们无需从头开始学习如何编写代码,而是直接让 Copilot 为我们完成代码生成。为了确保 Copilot 理解我们的需求,我们将在提出请求时保持极高的明确性。

Specifically, we’re only going to ask it to perform small amounts of work and then re-prompt it to perform the next step. Later we’ll discuss how to write good prompts, but for now, just go ahead and use what we’ve written below by placing this text at the top of your new file:

具体而言,我们会要求它完成一小部分工作,然后再提示它进行下一步操作。我们稍后会探讨如何撰写有效的提示词,但目前,你只需要将以下文本复制并粘贴到新文件的开头即可:

"""
open the csv file called "nfl_offensive_stats.csv" and read in
the csv data from the file
"""
"""
打开名为 `nfl_offensive_stats.csv` 的 csv 文件,并从文件中读取 csv 数据
"""

The """ at the top and the bottom are surrounding something called a docstring. Docstrings are an alternative way of commenting (similar to text starting with #). They are commonly used for describing functions (see Chapter 3 for details on functions), but we use them in this example to avoid Copilot continually generating comments (see the “Comments only” problem in table 2.1). Given this prompt, Copilot should start generating code. For us, it produced this block of code:

顶部和底部的 """ 符号围绕的是一种称为文档字符串(docstring)的东西。文档字符串是一种注释方式(类似于以 # 开始的文本)。它们常用于描述函数(函数的详情请见第三章),但在本例中,我们使用它们以避免 Copilot 持续生成注释(参见表 2.1 中的“仅注释”问题)。有了这样的提示,Copilot 就应该开始生成代码了。对我们而言,它生成了以下代码块:

import csv
with open('nfl_offensive_stats.csv', 'r') as f:
    reader = csv.reader(f)
    nfl_data = list(reader)

First, for the purpose of reading this book, we want to remind you that the prompt is displayed differently than what Copilot produces. This is intentional so you can tell what we wrote (and you should write) and what Copilot wrote.

首先,为了便于本书的阅读,我们想提醒你,提示词的展示形式与 Copilot 生成的内容有所不同。这是故意为之,以便你能清楚地区分出我们所编写的内容(及你应编写的内容)与 Copilot 生成的内容。

Second, the code produced by Copilot is quite reasonable. We don’t expect you to be able to understand the code at this point in the book, but you can likely see the name of the file we wanted opened and some code about opening the file and reading in the file. Later in the book, we’ll learn how to read through the code. For now, just keep following along.

其次,Copilot 生成的代码相当合理。我们并不期望你在目前阅读本书的阶段就能完全理解代码,但你可能已经看到了我们希望打开的文件名,以及一些有关打开和读取文件的代码。在本书的后续内容中,我们将引导你学习如何阅读这些代码。现阶段,请你继续按照书中的指引前进。

Now that we have the data from the file, we’re going to give it a new prompt to ask it to sum all the passing yards for Aaron Rodgers in this dataset.

现在我们已经从文件里提取出了数据,下一步我们会提出一个新的提示词,目的是让它计算数据集中 Aaron Rodgers 的全部传球码数总和。

Because the computer doesn’t know what football is or specifics like that Aaron Rodgers is a quarterback, our prompt is going to be quite specific. We’ll teach you how to write prompts like this over the course of the book. Here is the new prompt:

鉴于计算机不了解橄榄球或者具体到 Aaron Rodgers 是一名四分卫的信息,我们的提示词需要极为明确。在本书的学习过程中,我们将引导你学会如何撰写这样的提示词。以下是新的提示词:

"""
In the data we just read in, the fourth column is the player
and the 8th column is the passing yards. Get the sum of
yards from column 8 where the 4th column value is
"Aaron Rodgers"
"""
"""
在我们刚刚导入的数据里,第四列标示的是球员名称,而第八列是传球码数。
请求出在第四列值为“Aaron Rodgers”情况下,第八列码数的总和。
"""

Notice how we tell the computer which columns are for players, and which are for passing yards. That’s to tell the computer how to interpret the data. Also, notice how we say specifically that we only want to sum the yards in the case that the player’s name is Aaron Rodgers. Again, we’ll teach you how to write prompts like this as we move forward in the book. Given this prompt, Copilot then produced the code below.

请注意我们如何指示计算机哪些列是球员名称,哪些列是传球码数。这样做是为了让计算机知道如何对数据进行解析。此外,注意到我们特别强调只在球员名为 Aaron Rodgers 的情况下对码数进行求和。我们会在书中继续教你如何撰写这类提示词。根据这个提示词,Copilot 接着生成了下面的代码。

passing_yards = 0
for row in nfl_data:
   if row[3] == 'Aaron Rodgers':
       passing_yards += int(row[7])
print(passing_yards)

Reminder: Copilot is non-deterministic

Remember from Chapter 1 that Copilot is nondeterministic so what Copilot gives you may not match what it gives us. This is going to be a challenge for the rest of the book: what do you do if you get a wrong result when we get a right result? We’re actually fairly confident that Copilot will give you a correct answer here, but if you get a wrong answer from Copilot, go ahead and just read the remainder of this section rather than working along with Copilot in VSCode. We will absolutely give you all the tools you need to fix the code when Copilot gives you a wrong answer, but that skill will be taught over the remainder of the book, so we don’t want you to get stuck on this now. When we run this code (recall how to “Run Code” from figure 2.1), we get this result: 13852

提醒:Copilot 具有非确定性特性

回想一下第一章的内容,Copilot 是具有非确定性的,这意味着 Copilot 给你的输出可能与给我们的输出不一致。这将成为本书剩余部分的挑战之一:当我们获得正确结果而你得到错误结果时,你该怎么办?实际上,我们对 Copilot 在此给出正确答案充满信心,但如果你从 Copilot 那里得到错误答案,请直接阅读本节其余内容,而不是在 VS Code 中继续尝试。我们将确保你拥有修正 Copilot 错误答案所需的所有工具,但这些技能会在书的后续部分介绍,因此我们不希望你现在就遇到障碍。当我们运行这段代码时(回想一下如何从图 2.1 中 “运行代码”),我们得到的结果是:13852。

Which is the correct answer. (We double-checked the answer, but if you are familiar with football, you can likely use estimates to see if the figure seems reasonable. Quarterbacks throw for 3,000-5,500 yards per season, and this is three seasons worth of data, so 13,852 seems like the right ballpark for a high performing quarterback.) What’s particularly interesting is that we planned on giving Copilot a third prompt to ask it to print the result, but Copilot guessed that’s what we’d want to do so it did so on its own.

这便是正确的答案。(我们已对此答案进行了复核,但如果你对橄榄球略有认识,你可能通过估算判断这个数值是否合理。四分卫每赛季的传球码数一般在 3,000 至 5,500 码之间,而这是三个赛季的数据,因此对于一位表现卓越的四分卫而言,13,852 码似乎是在合理范围内。)尤其有趣的是,我们原计划给 Copilot 提出第三个提示词,请求其打印结果,但 Copilot 似乎已经猜到了我们的意图并自行执行了这一步。

What do we want you to take from this example (and the rest of the chapter)?

我们期望你从这个示例(及本章的其余内容)中学到什么?

  1. Copilot is a powerful tool. We didn’t write any code, but we were able to get Copilot to generate the code needed to perform a basic analysis of the data. For readers who have used spreadsheets, you can probably think of a way to do this using spreadsheet applications like Excel, but it likely wouldn’t be as easy as writing code like this. Even if you haven’t used spreadsheets before, you’ve got to admit that it’s amazing that writing some basic, human-readable prompts can produce correct code and output like this.

  2. Copilot 是一个功能强大的工具。 我们没有亲手编写过任何代码,却能够利用 Copilot 生成完成数据基本分析所需的代码。对于那些有使用电子表格经验的读者,你可能会考虑通过 Excel 等电子表格工具来做这件事,但很可能不会像编写这样的代码那样简便。即便你以前没有使用过电子表格,也必须认可,仅通过撰写一些基本、易于人类阅读的提示词,就能够产出正确的代码和输出,这非常令人赞叹。

  3. Breaking problems into small tasks is important. For this example, we tried writing this code with just a single large prompt (not shown) or by breaking it into two smaller tasks. The larger prompt was almost identical text, just as a single prompt. What we found was that Copilot would usually give us the right answer with the larger prompt but would sometimes make mistakes. This was especially true in the next example we’ll show you. However, breaking the problem into smaller tasks significantly increased the likelihood of Copilot generating the right code. We’ll see how to break down larger problems into smaller tasks throughout the remainder of this book as this is one of the most important skills you’ll need. In fact, the next chapter helps you start understanding what are reasonable tasks to give to Copilot.

  4. 将问题分解成小任务是很重要的。 在这个例子中,我们尝试了使用一个较大的单一提示词(未展示)来编写这段代码,或者将其分解成两个较小的任务。较大的提示词几乎与单个提示的文本完全相同。我们发现,使用较大的提示词时,Copilot 通常会给出正确的答案,但有时会犯错。这在我们将要展示的下一个例子中尤其明显。然而,将问题分解成较小的任务显著增加了 Copilot 生成正确代码的可能性。在本书的剩余部分,我们将看到如何将较大的问题分解成小任务,因为这是你将需要的最重要的技能之一。实际上,下一章将帮助你开始理解,哪些任务是适合交给 Copilot 的。

  5. We still need to understand code to some degree. This is true for several reasons. One is that writing good prompts requires you to have a basic understanding of what computers know and what they don’t. We can’t just give a prompt to Copilot that says, “Give me the number of passing yards for Aaron Rodgers.” Copilot likely wouldn’t be able to figure out where the data is stored, the format of the data, which columns correspond to players and passing yards, or that Aaron Rodgers is a player. We had to spell that out to Copilot for it to be successful. Another reason has to do with determining whether code from Copilot is reasonable. When the two of us read the response from Copilot, we know how to read code so we can determine if the code produced by Copilot is reasonable. You’ll need to be able to do this to some degree, which is why Chapters 4 and 5 are dedicated to reading code.

  6. 我们仍需要在一定程度上理解代码。 这背后有几个原因。首先,要写出好的提示词,你需要对计算机知道什么和不知道什么有一个基本的理解。我们不能只给 Copilot 一个提示说,“给我 Aaron Rodgers 的传球码数。” Copilot 可能不会明白数据存储在哪里,数据的格式,哪些列是指球员和传球码数,或者 Aaron Rodgers 是一位球员。我们需要明确指出这些信息,Copilot 才能成功执行。另一个原因是关于判断 Copilot 生成的代码是否合理。当我们阅读 Copilot 的反馈时,我们能够读懂代码,从而判断 Copilot 生成的代码是否合理。你也需要具备这种判断能力,这也是第四章和第五章专门用于学习如何阅读代码的原因。

  7. Testing is important. When programmers talk about testing, they’re referring to the practice of making sure that their code works correctly, even in possibly unexpected circumstances. We didn’t spend much time on this piece other than checking if Copilot’s answer is plausible using estimates on just one data set, but in general we’ll need to spend more time on testing as this is a critical part of the code writing process. It likely goes without saying, but errors in code range from embarrassing (if you tell your hard-core NFL fan friend the wrong number of passing yards for a player) to dangerous (if software in a car behaves incorrectly) to costly (if businesses make decisions on wrong analyses). Concerningly, even after you’ve learned how to read code, we have first-hand experience to tell you that even if the code looks correct, it might not be! To address this, we have to test every piece of code created by Copilot to ensure it does what it should. We’ll learn how to rigorously test Copilot’s code in later chapters.

  8. 测试极为重要。 当程序员提到测试时,他们指的是确保代码即便在可能出现的意外情况下也能正确执行的实践。我们在这一部分并没有花太多时间,仅通过对单一数据集进行估算来判断 Copilot 的答案是否合理,但总体而言,我们需要在测试上投入更多的时间,因为这是编码过程中至关重要的一环。虽然可能不需要特别强调,但代码错误的后果从尴尬(例如,向你那位狂热的 NFL 粉丝朋友报告错误的球员传球码数),到危险(如汽车软件行为不当),再到代价高昂(如企业基于错误的分析做出决策)。值得担忧的是,即使你已经学会了如何阅读代码,我们有直接的经验告诉你,即使代码看似正确,也可能并非如此!因此,我们必须对 Copilot 生成的每个代码段进行测试,以确保其按照预期工作。在后面的章节中,我们将学习如何对 Copilot 生成的代码进行严格的测试。

To showcase the power of Copilot, we’re going to continue this example. Please feel free to follow along writing the prompts and running the code in Copilot or just reading along.

为了充分展现 Copilot 的能力,我们接下来将继续这个示例。你可以选择跟随我们一起编写提示词并在 Copilot 中运行代码,或者只是进行阅读。

Step 2: How well did all the quarterbacks do over that time period?

第 2 步:在这段时间里,所有四分卫的整体表现如何?

Knowing how well Aaron Rodgers did is interesting. But a next and more meaningful step would be to compare his stats to other quarterbacks over that time period. We really only want to compare against other quarterbacks because they are the players whose job is to throw the ball. Sure, a running back might throw the ball once a season, and it’s often fun when they do, but it’s not really their job. To include only the quarterbacks, we need to go back to the data for a moment. The third column is for “position” and “QB” stands for Quarterback. As such, let’s delete all the code Copilot gave us (everything after the end of our initial comments) and start over. We won’t show each step like we did above, instead, here is the result of the entire interaction with Copilot with our prompts highlighted as separate from the code it gave us.

了解阿隆·罗杰斯(Aaron Rodgers)的表现固然有趣,但下一个更有意义的步骤将是将他的数据与同期其他四分卫进行比较。我们真正只想与其他四分卫比较,因为他们是负责传球的球员。当然,一个跑卫可能一个赛季传一次球,而且他们这么做时通常很有趣,但这并不真正是他们的工作。为了仅包括四分卫,我们需要重新查看一下数据。第三列是“位置”,“QB”代表四分卫。因此,让我们删除 Copilot 给我们的所有代码(即我们最初评论结束后的所有内容)并重新开始。我们不会像上面那样展示每个步骤,而是这里展示与 Copilot 互动的整个结果,我们的提示作为与其给出的代码分开的部分高亮显示。

Listing 2.1 Copilot’s code to analyze the top Quarterbacks

列表 2.1 Copilot 分析顶级四分卫的代码

"""
open the csv file called "nfl_offensive_stats.csv" and
read in the csv data from the file
"""
# import the csv module
import csv
# open the csv file
with open('nfl_offensive_stats.csv', 'r') as f:
    # read the csv data
    data = list(csv.reader(f))
"""
the 3rd column in data is player position, the fourth column
is the player, and the 8th column is the passing yards.
For each player whose position in column 3 is "QB",
determine the sum of yards from column 8
"""
# create a dictionary to hold the player name and passing yards
passing_yards = {}
# loop through the data
for row in data:
    # check if the player is a quarterback
    if row[2] == 'QB':
        # check if the player is already in the dictionary
        if row[3] in passing_yards:
            # add the passing yards to the existing value
            passing_yards[row[3]] += int(row[7])
        else:
            # add the player to the dictionary
            passing_yards[row[3]] = int(row[7])
"""
print the sum of the passing yards sorted by sum
of passing yards in descending order
"""
for player in sorted(passing_yards, key=passing_yards.get, reverse=True):
    print(player, passing_yards[player])

Notice that we gave Copilot three prompts. The first was to handle the input data, the second was to process the data, and the third was to output the response. This cycle of input data, process data, provide output is exceptionally common in programming tasks.

注意,我们向 Copilot 提出了三段提示词。第一段提示词针对输入数据的处理,第二段涉及数据的加工处理,而第三段则关于输出的响应。这种 “输入数据、加工数据、输出响应” 的流程在编程任务里非常典型。

Looking at the results from Copilot, we have to point out that we’ve taught programming for years and this is pretty impressive. We might ask students to solve something like this on a final exam in our college-level classes and we suspect less than half the class would do it correctly. Without diving into too many details, Copilot chose a good way of storing the data by using a dictionary (not a normal dictionary like an English dictionary, but a way of storing data in Python) which is a good choice here and used a clever way of sorting the data to help in displaying the results.

观察 Copilot 的输出结果,我们必须指出,身为多年编程教育者,这成果确实令人赞叹。在我们大学水平的课程中,我们可能会在期末考试上布置类似的题目,但我们估计不到半数的学生能正确答题。Copilot 选择了使用字典(这里说的字典不是像英语字典那种普通字典,而是一种在 Python 中用于数据存储的方式)来存储数据,这是个明智的选择;,并且它还巧妙地采用了排序方法来辅助结果的展示。

Thinking of the results, here are the first 5 lines from the output if you run the code:

我们来看看结果。如果执行代码,以下是输出的前五行内容:

Patrick Mahomes 16132
Tom Brady 15876
Aaron Rodgers 13852
Josh Allen 13758
Derek Carr 13271

If you follow football, these results should not be a surprise to you. Just to see how well Copilot can adapt to our wishes, let’s try to make a minor change.

如果你是橄榄球迷,这个结果对你而言不应该是什么意外。为了测试 Copilot 对我们需求的适应能力,我们来尝试做一个小修改。

Suppose that because Tom Brady is already recognized as one of the best QBs of all time, you would rather omit him from this comparison.

考虑到汤姆·布雷迪(Tom Brady)已被广泛认为是史上最伟大的四分卫之一,你可能更愿意在这次比较中忽略他。

To make this change, we’re just going to modify the prompt at the bottom. Go to the point in the code where it says:

为实现这个调整,我们将仅仅修改底部的提示。请找到代码中以下位置:

"""
print the sum of the passing yards sorted by sum
of passing yards in descending order
"""
for player in sorted(passing_yards, key=passing_yards.get, reverse=True):
    print(player, passing_yards[player])

Delete the code, just leaving the comment, and add another line to the docstring like this:

删除代码,仅留下注释,并在文档字符串中按如下方式添加另一行:

"""
print the sum of the passing yards sorted by sum
of passing yards in descending order
Do not include Tom Brady because he wins too much
"""
"""
打印传球码数的总和,按照传球码数的总和进行降序排序
不包括汤姆·布雷迪(Tom Brady),因为他获胜次数过多
"""

Copilot then suggested to us:

随后,Copilot 给我们的建议是:

for player in sorted(passing_yards, key=passing_yards.get, reverse=True):
    if player != "Tom Brady":
        print(player, passing_yards[player])

That’s exactly what we’d like to see changed in the code. (Thanks, Tom Brady, for being a good sport in this example.) The code excluded all data for Tom Brady at the point of printing the results. When we save the file and run it again, the first 5 lines are now:

这正是我们想在代码中进行的更改。(感谢汤姆·布雷迪(Tom Brady)在这个例子中展现出的良好体育精神。)代码在输出结果的环节中排除了所有关于汤姆·布雷迪的数据。当我们保存并重新运行文件时,现在的前五行如下:

Patrick Mahomes 16132
Aaron Rodgers 13852
Josh Allen 13758
Derek Carr 13271
Matt Ryan 13015

Step 3: Let’s Plot these stats, so we can compare them better

步骤 3:绘制这些数据的统计图,便于我们更好地进行比较

Let’s really drive home our key point that Copilot is a powerful tool by asking it to go even one step farther. The printout of all the quarterback stats is likely a useful analysis for some purposes. But a visual plot might be a better way of presenting this information. Can we ask Copilot to plot it?

我们通过要求 Copilot 进行更进一步的操作,从而实质上突显出我们的核心论点:Copilot 是一款强大的工具。对于某些特定目的而言,打印所有四分卫的统计数据可能是一项有益的分析。但是,采用可视化图表可能是展现这些信息的更好途径。我们可以请求 Copilot 来进行图表绘制吗?

Suppose we just care about the top performing quarterbacks, and so we decide to set an arbitrary limit of 4,000 yards as the minimum number of yards a quarterback needs to throw during this time window to be included in our plot. Granted, asking Copilot to plot the data is likely to introduce some new challenges because it will involve going beyond straight text, but let’s give it a try.

假定我们仅对顶尖表现的四分卫感兴趣,所以我们决定设定一个临界值,即在这个时间框架内四分卫需要投掷的最少码数为4000码,这样才能将他们纳入我们的图表分析。确实,要求 Copilot 进行数据绘图可能会带来一些新挑战,因为这需要进行非纯文本的操作,但我们可以尝试一下。

At the end of the code we just wrote, add the following comment:

在我们刚编写的代码的末尾添加以下注释:

"""
plot the players by their number of passing yards only for
players with more than 4000 passing yards
"""
"""
仅针对传球码数超过4000码的球员,绘制其传球码数图表
"""

When we gave Copilot this prompt, here is the code it gave us:

我们向 Copilot 提出这段提示词后,它给出的代码如下:

Listing 2.2 Code given by Copilot to plot our data about quarterbacks.

列表 2.2 Copilot 提供的绘制四分卫数据的代码。

import matplotlib.pyplot as plt
import numpy as np
# create a list of players
players = []
# create a list of passing yards
yards = []
# for each player in the dictionary
for player in passing_yards:
   # if the player has more than 4000 passing yards
   if passing_yards[player] > 4000:
       # add the player to the list of players
       players.append(player)
       # add the passing yards to the list of passing yards
       yards.append(passing_yards[player])
# create a numpy array of the passing yards
yards = np.array(yards)
# create a numpy array of the players
players = np.array(players)
# sort the players by the passing yards
print(np.argsort(yards))
players = players[np.argsort(yards)]
# sort the passing yards
yards = np.sort(yards)
# create a figure
plt.figure()
# create a bar chart of the players and their passing yards
plt.bar(players, yards)
# rotate the x axis labels
plt.xticks(rotation=90)
# show the plot
plt.show()

We gave Copilot this prompt a number of times and it would consistently produce a reasonable graph, but the code and graph varied each time. This is just one representative result, and in this result, the code is notably more advanced than the code you saw earlier this chapter. In fact, we need a more in depth understanding of the code to be able to walk through it in any real detail so we’re just going to skip any attempt to read through or interpret the code at this point in the book. We can tell you that, at a high-level, it properly imported a Python module designed to make plots (called matplotlib), did some fairly clever data manipulation in the middle using a Python module called numpy, and even had the sense to rotate player names so that they could print well as an x-axis label.

我们多次向 Copilot 提供这段提示词,它每次都能产生一个合理的图形,但是代码和图形每次都有所变化。这只是其中一个具有代表性的结果,在这个结果中,代码比本章早些时候展示的代码要复杂得多。事实上,我们需要更深层次地理解代码才能详细地分析它,因此我们在本书的这个阶段将跳过任何尝试去阅读或解释代码。可以高层次地告诉你,它正确地导入了一个用于绘图的 Python 模块(名为 matplotlib),在中部利用名为 numpy` 的 Python 模块进行了一些相当巧妙的数据操作,并且甚至还考虑到了将球员名字旋转,使其能够作为 x 轴标签清晰地打印。

If you go to run this code, you might hit a snag, however. Because Copilot learned from code in GitHub, it doesn’t know what Python modules are installed on your personal machine. The programmers who wrote the code that Copilot learned from likely had matplotlib installed, and matplotlib is the right module to use here, but matplotlib is not a module installed by default in Python. If you don’t have it installed, you’ll get an error when you try to run the code about not finding the “matplotlib” module.

但是,当你尝试执行这段代码时,可能会遇到一个问题。因为 Copilot 根据 GitHub 上的代码进行学习,它无法知道你的个人设备上安装了哪些 Python 模块。Copilot 学习的代码的编写者很可能安装了 matplotlib,而且在这种情况下使用 matplotlib 是正确的选择,但是 matplotlib 并不是 Python 默认安装的模块。如果你尚未安装它,在尝试运行代码时,你将遇到一个错误,提示无法找到 matplotlib 模块。

Python modules

Python modules expand the capability of the programming language. There are many modules in Python and they can help you do anything from data analysis to creating websites to writing video games. You can recognize when code wants to use a Python module by the import statement in the code. Python doesn’t automatically install all the modules for you because you likely won’t use most of them. When you want to use a module then, you’ll need to install the package containing the module yourself.

Python 模块

Python 模块扩充了这门编程语言的功能范畴。Python 中存在大量模块,它们能帮助你完成各式各样的任务,包括数据分析、网站创建和视频游戏开发等。通过代码中的 import 语句,你可以知道代码希望使用 Python 模块。Python 并不会自动安装所有模块,因为很可能你不会用到它们的大部分。因此,当你希望使用某个模块时,你需要自己安装含有该模块的包。

To fix this error, you’ll need to install matplotlib. The good news is that Python has made it really easy to install new packages. Go to the “Terminal” at the bottom right of VSCode and type:

要修复这个错误,你需要安装 matplotlib。好消息是在 Python 安装新的包已经非常简单了。在 VS Code 右下角的 “终端” 里输入:

pip install matplotlib

Note

For some operating systems you may need to use pip3 rather than pip. On Windows machines, we recommend using pip if you followed our installation instructions. On Mac or Linux machines, we recommend using pip3.

注意

对于某些操作系统,你可能需要使用 pip3 而不是 pip。在 Windows 设备上,如果你是按照我们的安装说明进行操作的,那建议你使用 pip。而在 Mac 或 Linux 设备上,我们建议使用 pip3

When you run this command, you’ll see that a bunch of modules are installed, including numpy (the next module this code wants to use). (matplotlib requires Python modules of its own, so it installs all the modules you need to use matplotlib in addition to matplotlib itself.) When you try to run the code again, you’ll get a plot like this.

运行这个命令后,你会看到一大堆模块被安装了,其中包括 numpy(这段代码接下来也会用到它)。(matplotlib 自己也依赖别的 Python 模块,因此除了 matplotlib 本身,它还会安装使用 matplotlib 所需的所有模块。)当你尝试再次运行代码时,将得到如下图表。

Figure 2.5 The plot produced by the code in listing 2.2.

图 2.5 代码清单 2.2 所生成的图表。

In this bar graph, we see the y-axis is the number of passing yards and the x-axis is the player’s name. The players are sorted from the fewest yards (with a minimum of 4000) to most yards. Admittedly, it’s not perfect as it is missing a y-axis label and the names on the x-axis are cut off at the bottom, but this is pretty impressive given all we gave Copilot was a pretty short prompt. We could keep adding prompts to see if we can format the graph better, but we’ve already achieved the primary goals for this section which was to show you how powerful Copilot is at helping us write code and to get a feel for how to interact with Copilot.

在这个柱状图中,我们看到 y 轴是传球码数,x 轴是球员的名字。球员按照传球码数从最少(最少为 4000 码)到最多进行排序。诚然,这个图表并不完美,因为它缺少 y 轴标签,而 x 轴上的名字在底部被截断了,但考虑到我们给 Copilot 的提示词非常简短,这已经相当令人赞叹了。我们可以继续添加提示词,看看是否能更好地格式化图表,但我们已经完成了本节的主要目标,即向你展示 Copilot 在帮助我们编写代码方面的强大能力,并让你找到与 Copilot 互动的感觉。

Indeed, in this chapter, we’ve accomplished a great deal! If you’ve finished setting up your programming environment and followed along the example with us, you should be proud. You’ve taken a huge step toward writing software! Beyond the details of setting up your environment, we’ve written software to solve our first problem. Moreover, you’ve observed the process of writing software with Copilot that starts with writing good prompts to help Copilot give us the code we want. In the examples in this chapter, Copilot gave us the code we wanted without us needing to change the prompt or debug the code to figure out why it's not working properly. That was a nice way to showcase the power of using an AI Assistant to program, but you will often find yourself having to test the code, change the prompts, and sometimes try to understand why the code is wrong. This is the AI Assistant programming process that we’ll learn more about in upcoming chapters.

不得不说,我们在这一章取得了很大进展!如果你已经完成了编程环境的设置,并跟着我们的示例进行了操作,你应该感到自豪。你已经在编写软件的道路上迈进了重要的一步!不仅仅是完成了细碎的环境配置,我们还写出程序并完成了我们的第一个练习。此外,你还观察到了使用 Copilot 编写软件的过程,这个过程始于撰写良好的提示词,让 Copilot 生成我们想要的代码。在本章的示例中,Copilot 给出了我们想要的代码,而我们无需更改提示词,也无需调试代码来排查代码出错的原因。这个例子很好地展示了 AI 编程助手的强大能力,但你往往还是会发现自己需要测试代码、更改提示词,有时还需要尝试理解代码为何出错。在接下来的章节中,我们会更多地了解的与 AI 编程助手的协作过程。