anthonydb / practical-sql

Code and Data for the First Edition of "Practical SQL" by Anthony DeBarros, published by No Starch Press (2018).
Other
660 stars 406 forks source link

Chapter 4: COPY us_counties_2010... #5

Closed CactusKurt closed 5 years ago

CactusKurt commented 6 years ago

COPY us_counties_2010 FROM 'C:\YourDirectory\us_counties_2010.csv' WITH (FORMAT CSV, HEADER);

will give the error ERROR: invalid byte sequence for encoding "UTF8": 0xf1 0x61 0x20 0x41 CONTEXT: COPY us_counties_2010, line 1804 SQL state: 22021

This is because line 1804 contains a tilde in the NAME column "Dona Ana County"

anthonydb commented 6 years ago

Hi, @CactusKurt --

Would you let me know:

Thanks!

CactusKurt commented 6 years ago

Running Windows 10 w/ latest updates Installed PostgreSQL (64 bit) from the EnterpriseDB (EDB) site (Version 3.0) download I ran the query in pgAdmin

anthonydb commented 6 years ago

Using a fresh download of PostgreSQL 10.4 from EnterpriseDB on Windows 10, I'm not able to replicate this. I'm glad to investigate further, though. If you would please:

Thanks.

CactusKurt commented 6 years ago

Deleting the tilde over the "n" character in "Doña Ana County" solves the problem. Could the problem arise due to the encoding? Here is the screen grab of the "Definition" tab...

pgadmin_analysis

anthonydb commented 6 years ago

Oh, it's definitely an encoding issue. The only (frustrating) question is why it's working fine over here but causing trouble on your system. Let me investigate further, and thanks for the report.

Your settings in that screengrab match mine, btw.

anthonydb commented 6 years ago

@CactusKurt Just to rule things out:

Thanks ... I'm interested in getting to the bottom of this one :-)

anthonydb commented 6 years ago

@CactusKurt If you'd email me at practicalsqlbook@gmail.com, we can continue diagnosing as I want to ask some questions that are more appropriate in email.

CactusKurt commented 6 years ago

SHOW client_encoding; returned UNICODE

SHOW server_encoding: returned UTF8

When I used a fresh copy of us_counties_2010, I copy/pasted/saved it in a text editor (Notepad) before using COPY. I'm guessing that is causing the problem.

anthonydb commented 6 years ago

Hi,

The best way to work with the data files for the book is to either git clone this repository to your local machine, or download the contents of the repo using the download option at the repo main screen.

All the files in this repo are encoded UTF-8. (Confirmed that yesterday).) Opening the .sql files in a text editor to copy/paste the queries into pgAdmin is fine. However, if you open a CSV file in a text editor and re-save it, it's possible you will change the encoding and encounter import issues as the database (using the install in the book) is expecting UTF-8.

Encodings are a tricky business and take some time to wrap your head around. I still have much to learn, myself. My sense is that you've learned some good lessons here about the behavior of Excel and text files that will serve you well going forward.

Thanks for reading the book and please open more issues if you encounter anything unusual. Feel free to email me at practicalsqlbook@gmail.com as well.

tfaieta commented 5 years ago

Using PlusTable as my GUI I am receiving the same error.

Error: ERROR: extra data after last expected column CONTEXT: COPY us_counties_2010, line 2: "Autauga County,AL,050,3,6,01,001,1539582278,25775735,54571,22135,+32.5363818,-086.6444901,54571,5370..." Other Info:

Also, I cloned the repo locally so it shouldn't be an issue with the file. Any thoughts or recommendations? Thanks in advance.

anthonydb commented 5 years ago

Hi, @tfaieta --

The error you posted is different than the original one in this issue, which turned out to be a character encoding problem. Usually, the error message you posted occurs when the CSV file has more columns than the table you're importing into.

Can you post your entire COPY statement here?

anthonydb commented 5 years ago

@tfaieta Hi, were you able to resolve this? Please let me know. Thanks.

tfaieta commented 5 years ago

Yes I did! Sorry about not letting you know, it was a complete mistake on my part.

On Wed, Dec 12, 2018 at 5:03 PM Anthony DeBarros notifications@github.com wrote:

@tfaieta https://github.com/tfaieta Hi, were you able to resolve this? Please let me know. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/anthonydb/practical-sql/issues/5#issuecomment-446761489, or mute the thread https://github.com/notifications/unsubscribe-auth/AOPcW_hWfHwwgzcIHlKFOMXxmK1FfUfJks5u4X1NgaJpZM4U04jw .

anthonydb commented 5 years ago

Great, glad you got that sorted. Going to close this.

Lunasanz commented 4 years ago

Using PlusTable as my GUI I am receiving the same error.

Error: ERROR: extra data after last expected column CONTEXT: COPY us_counties_2010, line 2: "Autauga County,AL,050,3,6,01,001,1539582278,25775735,54571,22135,+32.5363818,-086.6444901,54571,5370..." Other Info:

  • MacOS Mojave 10.14.1
  • Installed Postgres via the instructions in the book, specifically the Postgres app

Also, I cloned the repo locally so it shouldn't be an issue with the file. Any thoughts or recommendations? Thanks in advance.

Using PlusTable as my GUI I am receiving the same error.

Error: ERROR: extra data after last expected column CONTEXT: COPY us_counties_2010, line 2: "Autauga County,AL,050,3,6,01,001,1539582278,25775735,54571,22135,+32.5363818,-086.6444901,54571,5370..." Other Info:

  • MacOS Mojave 10.14.1
  • Installed Postgres via the instructions in the book, specifically the Postgres app

Also, I cloned the repo locally so it shouldn't be an issue with the file. Any thoughts or recommendations? Thanks in advance.

Lunasanz commented 4 years ago

Hello! I am this issue as well. Any help is greatly appreciated. Thanks ERROR: extra data after last expected column CONTEXT: COPY us_counties_2010, line 2: "Autauga County,AL,50,3,6,1,1,1539582278,25775735,54571,22135,32.5363818,-86.6444901,54571,53702,4285..."

anthonydb commented 4 years ago

@Lunasanz By any chance, did you open the CSV file with another program, such as Excel?

brahim024 commented 3 years ago

yeah I have same problem but this my error msg ERROR: ERREUR: la relation « us_counties_2010 » n'existe pas LINE 1: SELECT * FROM us_counties_2010;

anthonydb commented 3 years ago

@brahim024 That message indicates you did not create the table using the code in Listing 4-2. See:

https://github.com/anthonydb/practical-sql/blob/master/Chapter_04/Chapter_04.sql#L20

brahim024 commented 3 years ago

Thank you @anthonydb I was lost

naboatng commented 1 year ago

Good morning . Please I just got this ERROR: invalid input syntax for type bigint: "1.05076E+11" CONTEXT: COPY us_counties_2010, line 72, column area_land: "1.05076E+11".

anthonydb commented 1 year ago

@naboatng Most likely, you opened the CSV file with Excel or another spreadsheet application, which reformatted the numbers to scientific notation. Download a fresh copy of the CSV file, do not open it with a spreadsheet, and try the import again.