Closed malcook closed 7 years ago
You're right that the CAST operator isn't available for CSV file types. That's because CSV doesn't offer a way to specify the data type of its content, so we don't know what we cast from really. Please have a look at the following example, that shows how to write transformation function calls into the CSV syntax:
https://github.com/dimitri/pgloader/blob/master/test/csv-before-after.load
Ah, I see. Unexpressible from command line but possible with the commandFile. Thanks for the pointer,
And for pgloader,
~Malcolm
@malcook could you post the command/load file you used, including the transformation? I'm still struggling with the syntax :/
I wound up stream editing the output of pwalk with a little perl
see https://github.com/fizwit/filesystem-reporting-tools/wiki/Postgresql-loading-practices
@malcook thank you! Did this provide you a better import performance?
It actually worked out for me to use the built-in pgloader transformation but the speed is around 1000rows/second which is far too small for me but I have no clue why it's that slow even when I tune around the concurrency/batch options.
FTR: My working loadfile looks like this:
LOAD CSV
FROM '/var/lib/spielwiese/dump.csv'
WITH ENCODING UTF8
(
field2 [null if blanks],
unixdate [null if blanks],
field3 [null if blanks]
)
INTO pgsql://gpadmin@localhost:5432/demo
TARGET TABLE flightpaths.points
(
date timestamp using (unix-timestamp-to-timestamptz unixdate),
field2,
field3
)
WITH truncate,
skip header = 1,
fields optionally enclosed by '"',
fields terminated by ',',
disable triggers,
on error stop,
drop indexes;
I am not currently using pwalk and have no performance stats to share. Good luck....
malcolm.cook@stowers.org
On Oct 23, 2018 5:14 AM, bongo notifications@github.com wrote: CAUTION: Non-Stowers email
@malcookhttps://github.com/malcook thank you! Did this provide you a better import performance?
It actually worked out for me to use the built-in pgloader transformation but the speed is around 1000rows/second which is far too small for me but I have no clue why it's that slow even when I tune around the concurrency/batch options.
FTR: My working loadfile looks like this:
LOAD CSV FROM '/var/lib/spielwiese/dump.csv' WITH ENCODING UTF8 ( field2 [null if blanks], unixdate [null if blanks], field3 [null if blanks] ) INTO pgsql://gpadmin@localhost:5432/demo TARGET TABLE flightpaths.points ( date timestamp using (unix-timestamp-to-timestamptz unixdate), field2, field3 ) WITH truncate, skip header = 1, fields optionally enclosed by '"', fields terminated by ',', disable triggers, on error stop, drop indexes;
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dimitri/pgloader/issues/527#issuecomment-432186678, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAdjulLKAI9asMHFSLewBksHUaFFfCBZks5unuv_gaJpZM4MaSbU.
Can I import a column of unix file timestamps, encoded as seconds since epoch (such as produced by
stat --format "%Y" *.txt
), into postgresql timestampz column using --cast?I've tried
--cast column atime to timestamp using unix-timestamp-to-timestampz
but I get errors.Perhaps I'm mistaken in expecting --cast to work with csv at all?
I look forward the the possibility of using pgloader toward this end, with your advice. Thanks