TriangleCommunications / reorder-table-columns

Create SQL to reorder columns in a PostgreSQL table
13 stars 4 forks source link

Allow parsing from sql file #1

Open ScotterMonk opened 4 years ago

ScotterMonk commented 4 years ago

First: THANK YOU for making this! It will save me a HUGE amount of time!

Question: Could my issue be that I'm using Python 3.8 with pip updated all libraries your code imports? Should I use a specific version of Python?

ERROR REPORT: C:\Python\Python38>reorder.py -d inventory -n public.tbl_items_to_items id id_session id_child t_part_no t_part_no_parent d_modified b_enabled -f c:\python\python38\logColReorder.txt -m Traceback (most recent call last): File "C:\Python\Python38\reorder.py", line 218, in main() File "C:\Python\Python38\lib\site-packages\click\core.py", line 829, in call return self.main(args, kwargs) File "C:\Python\Python38\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "C:\Python\Python38\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Python\Python38\lib\site-packages\click\core.py", line 610, in invoke return callback(args, *kwargs) File "C:\Python\Python38\reorder.py", line 198, in main cols, extras = get_columns(database, schema, table) File "C:\Python\Python38\reorder.py", line 25, in get_columns sql_text = get_dump_sql(database, schema, table) File "C:\Python\Python38\reorder.py", line 15, in get_dump_sql result = subprocess.run( File "C:\Python\Python38\lib\subprocess.py", line 489, in run with Popen(popenargs, **kwargs) as process: File "C:\Python\Python38\lib\subprocess.py", line 854, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Python\Python38\lib\subprocess.py", line 1307, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

GammaGames commented 4 years ago

The script uses pg_dump to get the table's sql. I'm not sure if that command is available for on Windows, you might have to run it on WSL instead.

ScotterMonk commented 4 years ago

Thanks! May I ask what WSL is?

ScotterMonk commented 4 years ago

Ah a google search shows pg_dump works on windows. Maybe it needs different params or something. I'll try to figure it out.

GammaGames commented 4 years ago

Whoops, didn't see your edit questions. I'll respond to the ones in the history too:

How does this script work if it doesn't log into PostgreSQL with credentials?

It is designed to run on the same host as the database, so it doesn't really need to authenticate by default. I could work on making it a bit more flexible, but adding a .pgpass file might be enough to get it to work automatically

Wondering maybe your code's purpose is to write SQL I can execute in PostgreSQL?

Yes, it takes the pg_dump command and adds some commands to reorganize the table. I did it that way because it should be easier to verify and test, you can look at the commands yourself to see what is going to be run.

Could my issue be that I'm using Python 3.8 with pip updated all libraries your code imports? Should I use a specific version of Python?

Python version shouldn't matter as long as it's >3.6. You might need to install the psycopg2-binary module instead, I've had more luck installing that module.

Here's some info on WSL, it's a really great tool: https://docs.microsoft.com/en-us/windows/wsl/about

Do you have postgres installed on windows? pg_dump normally comes bundled with it, but I haven't used the subprocess module on windows so I'm not 100% sure of its compatibilities.

ScotterMonk commented 4 years ago

Thank you for your quick and thoughtful answers! Yes, I am running both Python and my Postgres server on the same Windows 2019 server. psycopg2 is working because I already use it for reads/writes to that same database (inventory) from a Python app I'm building. Oh and I filled in connection string so your app has full admin permissions.

ScotterMonk commented 4 years ago

I didn't share with you the run string I was using because it shows up in the error message. But if you need to see anything, say the word and I'll share here. Muchly appreciate your hard work!!!

ScotterMonk commented 4 years ago

Oh and please don't bother to look at my edit questions because I edit every time I figure something out or eliminate a potential issue. Here's the run string I have been pasting into cmd prompt in the Python 38 folder: reorder.py -d inventory -n public.tbl_items_to_items id id_session id_child t_part_no t_part_no_parent d_modified b_enabled -f c:\python\python38\logColReorder.txt -m Do you think the file not found error may be coming from trying to write that log file? I created an empty one in the appropriate folder, just in case.

ScotterMonk commented 4 years ago

Hope I don't side track you by asking what you think of this: https://stackoverflow.com/questions/60549576/set-environment-variable-pgpassword-before-executing-a-command-pg-dump-on-wi

GammaGames commented 4 years ago

You can fill in the connection settings, they're on lines 16 and 108. Setting the PGPASSWORD environment variable is a reasonable way to set the password, I use that method pretty often with projects. You shouldn't need to share it.

The error looks like it's just failing to find the command. What does the console output when you enter where pg_dump?

ScotterMonk commented 4 years ago

Sorry for delay. Was eating.

C:\Python\Python38>where pg_dump INFO: Could not find files for the given pattern(s).

ScotterMonk commented 4 years ago

I just ran this: OK that looks like I need to (and I just did it): pip install pgdumplib <-- done Added import pgdumplib to top of your code. Re-ran and now this error: C:\Python\Python38>reorder.py -d inventory -n public.tbl_items_to_items id id_session id_child t_part_no t_part_no_parent d_modified b_enabled -f c:\python\python38\logColReorder.txt -m Traceback (most recent call last): File "C:\Python\Python38\reorder.py", line 219, in main() File "C:\Python\Python38\lib\site-packages\click\core.py", line 829, in call return self.main(args, kwargs) File "C:\Python\Python38\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "C:\Python\Python38\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Python\Python38\lib\site-packages\click\core.py", line 610, in invoke return callback(args, *kwargs) File "C:\Python\Python38\reorder.py", line 199, in main cols, extras = get_columns(database, schema, table) File "C:\Python\Python38\reorder.py", line 26, in get_columns sql_text = get_dump_sql(database, schema, table) File "C:\Python\Python38\reorder.py", line 16, in get_dump_sql result = subprocess.run( File "C:\Python\Python38\lib\subprocess.py", line 489, in run with Popen(popenargs, **kwargs) as process: File "C:\Python\Python38\lib\subprocess.py", line 854, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Python\Python38\lib\subprocess.py", line 1307, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

ScotterMonk commented 4 years ago

What are your thoughts on this: https://pypi.org/project/pgdumplib/

ScotterMonk commented 4 years ago

Oh and where in your code should I put this (and of course putting my "postgres" user's pw in there): SET PGPASSWORD=secret_pw&&pw_dump AND is that the exact way to write it? AND, heh, is that all that is needed in terms of setting that variable, as in no reference to PGPASSWORD later?

ScotterMonk commented 4 years ago

Hey if you want, when I get this finished, happy to send to you so you have a "windows version"!

ScotterMonk commented 4 years ago

Oh and I had filled in the database creds at 108 but on 16 I'm unsure how to do so. Seemed obvious at 108 but not on 16.

def get_dump_sql(database: str, schema: str, table: str) -> str:
    """Get SQL that would be returned with `pg_dump`"""
    result = subprocess.run(
        ["pg_dump", "--schema-only", f"--table={schema}.{table}", database],
        capture_output=True,
        check=True,
    )
    return result.stdout.decode()
    with psycopg2.connect(host="184.175.105.38", database=database, user="postgres", password="mysecret") as conn:

Since it is local maybe I should leave out "host=...."?

ScotterMonk commented 4 years ago

Hmm I looked again at pgdumplib and now I"m thinking maybe this is a different, not as well supported pg_dump? Did you look at that page?

GammaGames commented 4 years ago

pg_dump is a utility that is normally included with postgres: https://www.postgresql.org/docs/current/app-pgdump.html

It looks like they might already be included, they might be in the bin folder for your postgresql installation. In that case you'd have to either add the path to your environment, create a symlink, or modify the script to include the executable's path.

ScotterMonk commented 4 years ago

Thanks. Looking. I was just reading https://www.postgresql.org/docs/9.1/app-pgdump.html. Will look at the URL you just shared.

ScotterMonk commented 4 years ago

Dangit. Both pages on pg_dump give command line examples but not Python examples. Will keep looking.

GammaGames commented 4 years ago

As to the environment variable, unless you use some tool to set it you'll have to set every time you open a new command prompt.

The subprocess command is running the command line, the first argument for subprocess.run is just the command broken into an array (["pg_dump", "--schema-only", f"--table={schema}.{table}", database] = pg_dump --schema-only --table=schema.table database)

ScotterMonk commented 4 years ago

Yep! pg_dump.exe is in my bin folder! Also, found examples where I need to:

import sh
from sh import pg_dump
from sh import pg_restore
ScotterMonk commented 4 years ago

Ah that's helpful. I'll look at it with new eyes now :-)

GammaGames commented 4 years ago

I didn't know about the sh module so I didn't use it. At some point I might have to update this to use the feature, it looks fantastic

ScotterMonk commented 4 years ago

Sadly now I'm getting errors that say

C:\Python\Python38>reorder.py -d inventory -n tbl_items_to_items id id_session id_child t_part_no t_part_no_parent d_modified b_enabled -f c:\python\python38\logColReorder.txt -m
Traceback (most recent call last):
  File "C:\Python\Python38\reorder.py", line 13, in <module>
    import array, fcntl, os
ModuleNotFoundError: No module named 'fcntl'

Appears that fcntl is a Unix thing, even though sh is in the windows version and imported fine with pip. I'm starting to wonder if I could ever get this working.

ScotterMonk commented 4 years ago

Here's what a guy on stack overflow said:

The fcntl module is just used for locking the pinning file, so assuming you don't try multiple access, this can be an acceptable workaround. Place this module in your sys.path, and it should just work as the official fcntl module.

Place this module in your sys.path, and it should just work as the official fcntl module: Scott: I'm willing to try this but wondering what I name the py file ("fcntl.py"?).

def fcntl(fd, op, arg=0):
    return 0

def ioctl(fd, op, arg=0, mutable_flag=True):
    if mutable_flag:
        return 0
    else:
        return ""

def flock(fd, op):
    return

def lockf(fd, operation, length=0, start=0, whence=0):
    return
ScotterMonk commented 4 years ago

I think I'm going down the wrong path (rabbit hole) because now it's finding other stuff it wants, like "tty".

ScotterMonk commented 4 years ago

I'll look for other solutions to the "change column order" thing, but focused on windows. I appreciate you offering so much of your time to help me today! I'll subscribe to you, for if/when you decide to build a windows-friendly version! THANKS AGAIN!

ScotterMonk commented 4 years ago

Just thought of this: I'm wondering if - as you developed this python script - you first (or during) wrote the SQL you wanted it to generate? What I'm getting at is if you could share that raw SQL with me and I bet I can figure out where to change out or add table name and column names?

ScotterMonk commented 4 years ago

This looks do-able! Especially given the GUI I use for db management is dBeaver, which will give me SQL for re-creating any of my tables, including foreign keys. A bit more work than your automated tool but at least this will be simple for someone like me, heh!

-- copy data to temporary table
CREATE TEMPORARY TABLE _temp_users (
    id INTEGER PRIMARY KEY,
    user TEXT NOT NULL,
    pass TEXT NOT NULL
);

INSERT INTO _temp_users SELECT
id,
user,
pass
FROM users;

DROP TABLE users;

 -- create new table and copy the data back (make changes here)

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    user TEXT NOT NULL,
    pass TEXT NOT NULL,
    beans INTEGER NOT NULL DEFAULT 0 -- just added this
);

INSERT INTO users SELECT
id,
user,
pass,
0 -- added here too
FROM _temp_users;

DROP TABLE _temp_users;
GammaGames commented 4 years ago

Looks like the output of pg_dump, if I remember correctly DBeaver uses it behind the scenes. I could add a flag to take an input file instead of reading from the command line.

ScotterMonk commented 4 years ago

Ah. Yeah in DBeaver it's called "generate SQL" and you can do it with any table, tables, or whole database. I would imagine a common feature of most modern UIs? I don't know because I haven't played with others except for pgAdmin and ... don't tell... Microsoft Access, heh. I never used one for SQL Server.

GammaGames commented 3 years ago

I added a branch that has the option to read an sql file (either from the output of pg_dump or the generate SQL tool in DBeaver). It doesn't have any way to find foreign keys that reference the table, so it doesn't manage those, but otherwise it worked fine in my testing. I haven't merged it yet.

Edit: Oh neat, it looks like you can do this work in dbeaver (https://github.com/dbeaver/dbeaver/issues/9720)! I'm doing it on a larger scale automatically, but for one-off tables it might not be necessary.