dimitri / pgloader

Migrate to PostgreSQL in a single command!
http://pgloader.io
Other
5.43k stars 545 forks source link

Unable to understand syntax error #79

Closed Themanwithoutaplan closed 10 years ago

Themanwithoutaplan commented 10 years ago

I know there is an issue about errors related to subclauses but I'm not even getting that far. :-( I assume it's a tiny thing but I can't work it out. I've double-checked the columns in the clause using a simple select in Postgres. Are there any special syntax rules for the clauses that we need to be aware of?

This is my script

LOAD CSV
    FROM 'httparchive_Jun_1_2014_pages.csv' (pageid, createDate, archive, label, crawlid,
wptid, wptrun, url, urlShort, urlhash, cdn, startedDateTime, TTFB,
renderStart, onContentLoaded, onLoad, fullyLoaded, visualComplete, PageSpeed,
SpeedIndex, rank, reqTotal, reqHtml, reqJS, reqCSS, reqImg, reqGif, reqJpg,
reqPng, reqFont, reqFlash, reqJson, reqOther, bytesTotal, bytesHtml, bytesJS,
bytesCSS, bytesImg, bytesGif, bytesJpg, bytesPng, bytesFont, bytesFlash,
bytesJson, bytesOther, bytesHtmlDoc, numDomains, maxDomainReqs, numRedirects,
numHttps, numGlibs, numErrors, numCompressed, numDomElements, maxageNull,
maxage0, maxage1, maxage30, maxage365, maxageMore, gzipTotal, gzipSavings,
_connections, _adult_site, avg_dom_depth, document_height, document_width,
localstorage_size, sessionstorage_size, num_iframes, num_scripts, doctype,
meta_viewport)
    INTO postgresql://postgres@localhost/http ("createDate", archive, label,
wptid, wptrun, url, cdn, "startedDateTime", "TTFB",
"renderStart", "onContentLoaded", "onLoad", "fullyLoaded", "visualComplete", "PageSpeed",
"speedIndex", rank, "reqTotal", "reqHTML", "reqJS", "reqCSS", "reqImg", "reqGIF", "reqJPG",
"reqPNG", "reqFont", "reqFlash", "reqJSON", "reqOther", "bytesTotal", "bytesHTML", "bytesJS",
"bytesCSS", "bytesImg", "bytesGIF", "bytesJPG", "bytesPNG", "bytesFont", "bytesFlash",
"bytesJSON", "bytesOther", "numDomains", "maxDomainReqs", "numRedirects",
"numHTTPS", "numGlibs", "numErrors", "numCompressed", "numDomElements", "maxageNull",
maxage0, maxage1, maxage30, maxage365, "maxageMore", "gzipTotal", "gzipSavings")

    WITH
        fields optionally enclosed by '"',
        fields escaped by double-quote,
        fields terminated by ','

    SET
        work_mem to '128MB',
        standard_conforming_strings to 'on'

And this the error:

An unhandled error condition has been signalled:
   Could not parse subexpression "into" when parsing

 Expression PGLOADER.PARSER::COMMANDS
    Subexpression (+ PGLOADER.PARSER::COMMAND)
    Subexpression PGLOADER.PARSER::COMMAND
    Subexpression (AND
                   (OR PGLOADER.PARSER::LOAD-ARCHIVE
                       PGLOADER.PARSER::LOAD-CSV-FILE
                       PGLOADER.PARSER::LOAD-FIXED-COLS-FILE
                       PGLOADER.PARSER::LOAD-DBF-FILE
                       PGLOADER.PARSER::LOAD-MYSQL-DATABASE
                       PGLOADER.PARSER::LOAD-SQLITE-DATABASE
                       PGLOADER.PARSER::LOAD-SYSLOG-MESSAGES)
                   PGLOADER.PARSER::END-OF-COMMAND)
    Subexpression (OR PGLOADER.PARSER::LOAD-ARCHIVE
                      PGLOADER.PARSER::LOAD-CSV-FILE
                      PGLOADER.PARSER::LOAD-FIXED-COLS-FILE
                      PGLOADER.PARSER::LOAD-DBF-FILE
                      PGLOADER.PARSER::LOAD-MYSQL-DATABASE
                      PGLOADER.PARSER::LOAD-SQLITE-DATABASE
                      PGLOADER.PARSER::LOAD-SYSLOG-MESSAGES)
    Subexpression PGLOADER.PARSER::LOAD-CSV-FILE
    Subexpression (AND PGLOADER.PARSER::CSV-SOURCE
                       (ESRAP:? PGLOADER.PARSER::FILE-ENCODING)
                   (ESRAP:? PGLOADER.PARSER::CSV-SOURCE-FIELD-LIST)
                   PGLOADER.PARSER::TARGET
                   (ESRAP:? PGLOADER.PARSER::CSV-TARGET-COLUMN-LIST)
                   PGLOADER.PARSER::CSV-OPTIONS
                   (ESRAP:? PGLOADER.PARSER::GUCS)
                   (ESRAP:? PGLOADER.PARSER::BEFORE-LOAD-DO)
                   (ESRAP:? PGLOADER.PARSER::AFTER-LOAD-DO))
Subexpression PGLOADER.PARSER::TARGET
Subexpression (AND PGLOADER.PARSER::KW-INTO
                   PGLOADER.PARSER::DB-CONNECTION-URI)
Subexpression PGLOADER.PARSER::KW-INTO
Subexpression (AND PGLOADER.PARSER::IGNORE-WHITESPACE (ESRAP:~ "into")
                   PGLOADER.PARSER::IGNORE-WHITESPACE)
Subexpression "into"

Encountered at: LOAD CSV FROM 'httpa ^ (Line 1, Column 0, Position 0)

dimitri commented 10 years ago

You're using two cases of column names that the parser didn't know how to deal with: starting with the _ character, double-quoted names. That's now fixed.

Themanwithoutaplan commented 10 years ago

Thanks, I was getting there. There's also the simple error of a missing semi-colon at the end of the SET block. This seems to be necessary.

dimitri commented 10 years ago

Oh, the whole command itself must terminate with a semicolon, right.

Themanwithoutaplan commented 10 years ago

Yeah. That probably needs to go in the docs, right? And it seems to have to be within a block.

Sorry about the column names. Not really a lot I can do about them :-/. Just building the latest version to play with.

Just to be clear: what formatting can/should be used in the INTO block? Double-quotes where Postgres uses them? _ doesn't seem to be special.

dimitri commented 10 years ago

Well you can name the CSV fields as you like, the system will not check about anything for the naming, it's up to you. About the case in PostgreSQL... actually I think my patch is shy of a brick load, I don't think the quoting will propagate in the COPY command.

Themanwithoutaplan commented 10 years ago

I understand the CSV thing - the fields are just positional placeholders to help the Postgres mapping. Does that mean the quoting has to be mapped?

dimitri commented 10 years ago

The mapping doesn't have to be mapped, because the parser generates lisp code that will then UPCASE all symbols internally. The problem you're going to have, I think, is matching the PostgreSQL column names, because I don't recall the CSV code paying attention to needing to quote them.

dimitri commented 10 years ago

There, now it should work for you.

dpeharda commented 8 years ago

My problem was that the file was in UTF-8 with BOM. I changed it to ascii and it was solved