chapter one / not understanding pd.read parameters

KhidirA commented 3 years ago

hey all I'm trying to run the code example from chapter one in the book I know it said I have to have an idea about the libraries (which I kind of do from a coursera machine learning course) but I failed to understand line 8 I know the first parameter if locating the file but what is the second one do? also can anyone explain the next line too what are the parameters mean?

Praful932 commented 3 years ago

Hi @KhidirA Could you specify which notebook or better paste the code in codeblocks here

pdx97 commented 3 years ago

@KhidirA exactly which parameter are you not able to understand can you show the code here and the exact line number .

ageron commented 3 years ago

Hi @KhidirA ,

If I understand correctly you were confused about the arguments to the pd.read_csv() function in chapter 1:

oecd_bli = pd.read_csv(datapath + "oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv(datapath + "gdp_per_capita.csv",thousands=',',delimiter='\t',
                             encoding='latin1', na_values="n/a")

This function loads a CSV file. Here's what the arguments mean:

The first argument is the path to the file we want to load.
The thousands=',' argument specifies that "1,000,000" should be interpreted as "1000000" ( = one million).
The delimiter='\t' argument specifies that the fields in the CSV file are separated by tabs (\t) not commas. So the files are actually TSV (tab-separated values) files instead of CSV (comma-separated values) files.
The encoding='latin1' argument means that the files are encoded using the Latin-1 encoding. If you don't know what text encoding is, please check out this introduction.
Lastly, the na_values='n/a' argument says any field equal to "n/a" should be considered as an unspecified value.

If you search "pandas read_csv" on Google, you'll find this documentation page which explains these arguments as well as many others.

Hope this helps.

ageron / handson-ml

chapter one / not understanding pd.read parameters #588