Open rcarmo opened 10 years ago
@rcarmo It seems like disco is not detecting the newlines, trying to put everything as a single record and failing because of its large size. Which command did you use for pushing data into ddfs?
Also, just to make sure the lines are not too big, would you please run the following command? $ for i in $(zcat /srv/jobs/dataset.gz); do echo ${#i} >> /tmp/disco_tmp_sizes; done $ sort /tmp/disco_tmp_sizes | tail
Assuming you have enough memory to store the sizes.
I did a straight zcat dataset.gz | ddfs chunk data:dataset -
. I'm now waiting for a wc -L
to finish, should take a while. But an earlier test import of zcat .. | head -n 100000 | ddfs chunk...
worked, and that should have enough "atypical" samples.
Will report back.
I tried to import a variable-width data file via stdin to
ddfs chunk
, which failed with the following message:The file format is essentially a set of UUIDs separated by commas, with a variable number of columns per record.
It's fairly large, so it's hard to pinpoint exactly why this is failing, but I don't think it's due to a line exceeding one MB.
Any ideas?