astro-datalab / fits2db

2 stars 3 forks source link

Incorrect delimiter when using fits2db in text mode #5

Closed weaverba137 closed 4 years ago

weaverba137 commented 4 years ago

This example is from today on gp11:

weaver@gp11[30]: /usr/local/bin/fits2db --sql=p --create --drop -t ls_dr8.photo_z /net/dl2/data/ls_dr8_photo-z/processed-north/processed-sweep-100p035-110p040-pz.fits | head -50
DROP TABLE IF EXISTS ls_dr8.photo_z CASCADE;
CREATE TABLE IF NOT EXISTS ls_dr8.photo_z (
    ls_id   bigint,
    z_phot_mean real,
    z_phot_median   real,
    z_phot_std  real,
    z_phot_l68  real,
    z_phot_u68  real,
    z_phot_l95  real,
    z_phot_u95  real,
    z_spec  real,
    survey  text,
    training    smallint,
    RELEASE smallint,
    BRICKID integer,
    OBJID   integer
);

COPY ls_dr8.photo_z (ls_id,z_phot_mean,z_phot_median,z_phot_std,z_phot_l68,z_phot_u68,z_phot_l95,z_phot_u95,z_spec,survey,training,RELEASE,BRICKID,OBJID) FROM stdin DELIMITER '    ';
8797227849023489,0.127537,0.126812,0.031453,0.112025,0.141916,0.097242,0.150557,-99.000000,,0,8001,538867,1
8797227849023491,0.180320,0.148994,0.126425,0.066052,0.306153,0.006592,0.494875,-99.000000,,0,8001,538867,3

Note that the delimiter in the COPY command is '\t' (a tab character, not space characters), but the data are comma-delimited. This workaround allows the data to load:

/usr/local/bin/fits2db --sql=p --create --drop -t ls_dr8.photo_z /net/dl2/data/ls_dr8_photo-z/processed-north/processed-sweep-100p035-110p040-pz.fits | sed -r "s/DELIMITER '.+';/DELIMITER ',' CSV;/" |psql tapdb datalab

But it would be better if the delimiter declaration matched the output.

mjfitzpatrick commented 4 years ago

The bug is only triggered when you have an "--sql=p" flag set which was setting the incorrect delimiter value. The postgres flavor of SQL is the default anyway and you'd get the comma delimiter without the flag, the only difference is that text values won't be quoted. Fixed the code and deployed to gp11 and other machines.