Can't Import a .psv File

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1.  There is no .psv option in the drop down menu.  When using the CSV option 
and selecting the pipe in "fields terminated by" drop down on the import screen 
and any combination of the other drop down selections, I cannot import the file 
due to an "unexpected error"
2.  The SQL option won't work either

What version of Sequel Pro are you using? 1.0.1
What version of OS X are you using? 10.6.8
What version of MySQL are you connecting to on the server? Trying to connect 
locally.  5.6.1

Please provide any additional information below.
I received a file from a client that is bigger than anything I would typically 
work with in Excel (this is ~28M rows).  Having experience with SAS programming 
in the past, I planned on doing some analysis in R.  First, I need to read in 
the file and possibly append data, and MySQL was recommended for these tasks.  
I downloaded MySQL, made sure it was running, then found and set up Sequel Pro 
to use as the GUI, having never used MySQL or similar.  

Now I'm using it, I'm not certain I can work with a .psv file with Sequel Pro.  
Any suggestions, even of other apps to use to work with such a file?

Thanks, the learning curve is pretty steep.

Original issue reported on code.google.com by j.capu...@closedloopadvisors.com on 13 Feb 2013 at 9:52

GoogleCodeExporter commented 9 years ago

Hi there,

That is quite a large PSV, and it's possible that alone could be causing 
problems - CSV import should batch queries and attempt to keep memory use low, 
but that's quite a lot of rows :)

I believe the pipe character should work, and a few simple imports I've just 
tested have worked fine.

At what stage do you get the "unexpected error"?  What should happen after you 
hit the import button on the file selection dialog is that it should read and 
parse the file until it has a hundred rows, and then show you the field 
mapping/table creation dialog (which is very powerful if not immediately 
obvious!).  It sounds like it's not even getting that far for you - is that 
correct?

It sounds like it might be choking on the file itself.  We only support 
transparent import for the raw text, GZIP or BZIP2 compression.  The automatic 
encoding detection could also be running into problems, and so tripping up the 
rest.

What's the exact error message?  I can't find anything about an "unexpected 
error", I'm afraid.

Original comment by rowanb@gmail.com on 14 Feb 2013 at 12:50

Changed state: WaitingOnUser

GoogleCodeExporter commented 9 years ago

Hi and thanks for your response.  

I realize it's huge and may not work locally, but I don't have access to a 
large enough server to handle the request.  Please let me know if you think I'm 
going about this the wrong way.  

Your hunch is correct that the error is occurring immediately.  I decompressed 
a .tar.gz file which gave me the PSV.  Based on your comment of only supporting 
transparent import for GZIP, do you think that's worth a shot?  

The message did indeed read that processing stopped due to an "unexpected 
error".

Original comment by j.capu...@closedloopadvisors.com on 14 Feb 2013 at 2:35

GoogleCodeExporter commented 9 years ago

There shouldn't be any difference between importing a .gzip and the 
uncompressed file, so if you already have the uncompressed file I don't think 
that would improve matters.

I can't find any mention of "Unexpected error" in our codebase, nor of that of 
the encoding detection.  Is that the title or detail part of the error message, 
or is the message only that?  If so it sounds almost like a system-level 
error...

How large is the file, in terms of megs/gigs?

The other thing to try is to grab just the first hundred lines or so (perhaps 
using  `head -n 100 path/to/file.psv > testfile.psv`) to simplify some of the 
issues here :)

Original comment by rowanb@gmail.com on 14 Feb 2013 at 11:42

GoogleCodeExporter commented 9 years ago

I've attached a snapshop of the error I receive, which is a pop-up box.

The file is 5.8GB.  I know it's pretty large, but a friend had recommended 
attempting to run this on my machine before trying to find server space 
somewhere.

I tried to grab the first hundred lines of code as you advised and must've done 
something wrong as I received this message:  You have an error in your SQL 
syntax; check the manual that corresponds to your MySQL server version for the 
right syntax to use near 'head -n 100 
Macintosh_HD/Users/jdcap427/CLA/Offline_work/etsy_transaction/etsy_t' at line 1
The 100 was highlighted in red font in the query space above.

Thanks!

Original comment by j.capu...@closedloopadvisors.com on 14 Feb 2013 at 1:43

Attachments:

[Screen shot 2013-02-14 at 8.25.06 AM.png](https://storage.googleapis.com/google-code-attachments/sequel-pro/issue-1621/comment-4/Screen shot 2013-02-14 at 8.25.06 AM.png)

GoogleCodeExporter commented 9 years ago

Ohh... an exception!  Sorry, yes, that's an actual hard bug you're running 
into, not just an import failure.  Could you just add "1621" to that problem 
report description box and submit one?

I wonder if some of the file routines we're using are <2GB only - we can 
address that.

The `head -n 100 path/to/file.psv > testfile.psv` is a command to use in 
Terminal (or another command line util) to generate a small file from the start 
of your large one :)

Original comment by rowanb@gmail.com on 14 Feb 2013 at 1:56

GoogleCodeExporter commented 9 years ago

Ok. Just sent the error report with 1621 in the description box.  

I listed the altered command because it didn't work when I ran it and perhaps I 
made an error when adding the path or something.  

Thanks again for your help on this.  Let me know what to do after you receive 
the report.

Original comment by j.capu...@closedloopadvisors.com on 14 Feb 2013 at 2:03

GoogleCodeExporter commented 9 years ago

From the error you gave it sounded like you were trying to run the altered 
command in a MySQL prompt/custom query, whereas I meant it to be run in a 
standard unix/OS X shell, sorry.

It looks like the error occurred because it's trying to automatically create a 
table syntax as a result of the file import - because there's no current table 
to import into - and choking on some of the data.  I'm about to have lunch, so 
I might take a look at this then.

Original comment by rowanb@gmail.com on 14 Feb 2013 at 2:20

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Hi there,

I've committed a fix to that particular issue in r4021.  I'm attaching a build 
which is the 1.0.1 release but with that change added if you'd like to give it 
a go.

When you run the import now you should now see the field mapping sheet pop up 
after you've selected the file - as it needs to create a table to import the 
data into, note the field at the top right where it's prompting you for a table 
name.  After that you can start the import - I'm intrigued to know how it'll do 
for such a large file...

Original comment by rowanb@gmail.com on 18 Feb 2013 at 1:13

Changed state: WaitingOnUser

Attachments:

sequel-pro-1.0.1-csv-import-fix.dmg

GoogleCodeExporter commented 9 years ago

I'll download the new release and run the job overnight. 

Part of this is user error due to my unfamiliarity with your program.  I added 
a table to my DB and attempted an import beginning Friday at 10:30 pm EST.  
It's now Sunday at 10 pm and I just cancelled the job after it churned for 2 
days.  No errors and the index reflected one record showing the primary key.

Original comment by j.capu...@closedloopadvisors.com on 18 Feb 2013 at 3:10

GoogleCodeExporter commented 9 years ago

What should happen is that the rows are pulled out of the CSV file and sent to 
the server in batches, so there's hopefully already quite a lot of rows in the 
table?  Re-importing the file will generate an error for all the rows that 
already exist, so the import will be much slower the second time around unless 
you truncate the table first (each key error will make it redo the batch 
line-by-line to get the full error, so it'll be ~100x slower for the old rows!).

Keep an eye on the memory use of Sequel Pro as it imports - if it balloons, the 
speed will drop precipitously as it goes on, which could explain the two day 
import.  If so we'll need to do more work this side... the alternative is just 
that the server is taking a while to save all the data!

(For a while this large, without any complex per-cell data mapping or changes, 
you may also want to investigate the LOAD DATA INFILE command on the server - 
http://dev.mysql.com/doc/refman/5.1/en/load-data.html - much less config, but 
avoids network latency and query overhead...)

Original comment by rowanb@gmail.com on 18 Feb 2013 at 11:59

GoogleCodeExporter commented 9 years ago

As noted in my 1st response, based on a friend's suggestion, I'm attempting to 
run this locally (4 GB of RAM) before renting server space.  I now am convinced 
that was bad advice.  He seemed to justify the approach...  Anyway, it looks 
like I need to find server space.  

As you might guess, the activity monitored showed 85% of CPU use when I force 
closed Sequel Pro last night.  Hitting the "cancel" button had no effect even 
after more than an hour of waiting.  

Once I figure out a server setup, I'll let you know how this works out.  Thanks 
for the tips on truncating the table the server command.

Original comment by j.capu...@closedloopadvisors.com on 18 Feb 2013 at 1:47

GoogleCodeExporter commented 9 years ago

It really depends on how much work you'll be doing with the data - will it be 
analysis of the entire data set at once, or will you be querying parts of the 
data to work on subsets at a time?  If you're only working with bits of the 
data, and can generate indexes on the appropriate columns you'll be querying 
by, 4GB should be fine as it'll only really need to cache the indexes in RAM.  
If you're working with the entire data set at once though... :)

I'll do some more tests with a massive CSV file locally - the 85% could be fine 
(running queries!), or it could be bad (out of memory, all churning against 
disk).

Original comment by rowanb@gmail.com on 18 Feb 2013 at 1:59

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

This is all on the import attempt!  I do have access to a machine with an 8 GB 
RAM machine if you think that would work to import this table.  

Work involved will include joining columns from another table, followed by 
if-then statements that will define new variables, then I'll need to run 
summary stats (the latter two steps in R).

Original comment by j.capu...@closedloopadvisors.com on 18 Feb 2013 at 2:47

GoogleCodeExporter commented 9 years ago

Original comment by stuart02 on 25 Oct 2013 at 11:49

Added labels: Component-Logic, Priority-Low, Type-Defect

brittneybrinsfield / sequel-pro

Can't Import a .psv File #1621