Strata-Scratch / csv_to_db_automation

Python script that automates CSV file imports to a postgres database
54 stars 39 forks source link

upload an xlsx #3

Open sant3e opened 2 years ago

sant3e commented 2 years ago

I'm having the following issue: I have to dl some data from a SharePoint list; i can do it as a cvs file or an xlsx file. Unfortunately if i do it as a csv, some of the data gets corrupt, so i'm forced to do it as an xlsx. This option is not without issues as well (I had to add an extra function to clean some of the values within certain columns; which i've done successfully). The automation fails though at the last step (the upload to a db) with the following error:

Opened database successfully
sfs_tool_t1 was created successfully
file opened in memory
---------------------------------------------------------------------------
QueryCanceled                             Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_30764/2219752249.py in <module>
     30 
     31     # upload to db
---> 32     upload_to_db(host,
     33                  dbname,
     34                  user,

~\AppData\Local\Temp/ipykernel_30764/2190013522.py in upload_to_db(host, dbname, user, password, tbl_name, col_str, file, dataframe, dataframe_columns)
     26     """
     27 
---> 28     cursor.copy_expert(sql=sql_statement % tbl_name, file=my_file)
     29     print('file copied to db')
     30 

QueryCanceled: COPY from stdin failed: error in .read() call: UnicodeDecodeError 'charmap' codec can't decode byte 0x81 in position 7964: character maps to <undefined>
CONTEXT:  COPY sfs_tool_t1, line 1

I had to change all your csv statements into xlsx... up until the upload_to db function. There you have 2 statements: dataframe.to_csv(file, header=dataframe_columns, index=False, encoding='utf-8') my_file = open(file)

and

sql_statement = """ COPY %s FROM STDIN WITH CSV HEADER DELIMITER AS ',' """

As I am aware one can't upload an xlsx to a db (at least not through python)(though i might be wrong); so i let those remain as csv. Right now, I'm not sure if those are the culprits or maybe my clean_text function has to address more issues then what i identified... point is, i'm stuck. I've been googling for 3 days, tried different solutions but none works. I would really need your help on this, and this was the only way i could contact you. Can you help me out?

ClaskoTheKnight commented 2 years ago

Try and use this page to open your xlsx into pandas.

You should add in some print commands to check to see what the data looks like at each step.

I think the later functions that you're describing, are actually operating on the "cleaned" temp csv created by the program, so you shouldn't need to modify those.