MariaDB / mariadb_kernel

A MariaDB Jupyter kernel
BSD 3-Clause "New" or "Revised" License
30 stars 21 forks source link

add load magic commad, can load CSV file into table #14

Closed a97410985 closed 3 years ago

a97410985 commented 3 years ago

I am first time open a pull request. I solve an issue #https://github.com/MariaDB/mariadb_kernel/issues/11.

goal

  1. add a new magic command load, which can load CSV file for updating table data
  2. magic command occur error can show an error message to the user, errors like
    1. arguments not enough
    2. CSV file not exists
    3. table not exists
    4. CSV file data not match table schema

      demonstration(video)

      https://www.youtube.com/watch?v=JhE9DLwTnIY&ab_channel=%E6%B1%9F%E6%96%B0%E7%9F%A5

a97410985 commented 3 years ago

I fix the formatting issue😐

a97410985 commented 3 years ago

@robertbindar I had read all reviews and changed the codes. Below has a video link to demonstrate the DEMO.

a97410985 commented 3 years ago

@robertbindar I fixed all code that review mentioned demo video: https://www.youtube.com/watch?v=C9XYvjqCfKg&feature=youtu.be&ab_channel=xing-zhijiang

KartikSoneji commented 3 years ago

Hi, not sure if it is my place but just left some suggestions.

robertbindar commented 3 years ago

It is exactly your place, so feel free to comment, suggest improvements, or contribute yourself :-)

robertbindar commented 3 years ago

use_csv_update_table_cmd = f"""LOAD DATA LOCAL INFILE '{self.csv_file_path}' f"select * from {self.table_name} limit 5;"

These make me slightly nervous, because they can lead to (unintentional) SQL injections.

Ideally, the run_statement method should accept a list of substitution parameters like the Python connector

cur.execute("INSERT INTO test.accounts(first_name, last_name, email, amount) VALUES (?, ?, ?, ?)",
      (first_name, last_name, email, amount))

https://github.com/MariaDB/mariadb_kernel/blob/0dddfe5cd8c3c9ad80687fed05401cc5313e6d86/mariadb_kernel/mariadb_client.py#L21-L27

@robertbindar any reason not to use the official Python connector (mariadb)?

I think your SQL injection claim doesn't fit in an environment where you already allow the user to run arbitrary SQL/Python/etc code with almost no security checks when it comes to what users can write in their own notebooks.

Official Python connector didn't make much sense for me, it was much easier to just forward commands to the classical client. I will listen for arguments though and weigh whether it's worth switching to something like official mariadb python connector or community pymysql.

robertbindar commented 3 years ago

@a97410985, one last round trip, sorry it takes so long, once you remove the try-catch which @KartikSoneji pointed as redundant/error-prone, I will merge the PR

a97410985 commented 3 years ago

@a97410985, one last round trip, sorry it takes so long, once you remove the try-catch which @KartikSoneji pointed as redundant/error-prone, I will merge the PR

OK

a97410985 commented 3 years ago

I test for it. And I find sometimes the MariaDBClient.run_statement(...) does not catch the ERROR. like the video below. https://www.youtube.com/watch?v=bJT23YPviCU&feature=youtu.be&ab_channel=xing-zhijiang I add some log info to help understand what happens. like code below

    def run_statement(self, code, timeout=-1):
        if not code:
            return ""

        result = ""
        # TODO: double check exception handling
        try:
            result = self.maria_repl.run_command(code, timeout)
            self.log.info(f"result's length : {len(result)}")
            self.log.info(f"result.startswith('ERROR') : {result.startswith('ERROR')}")
            self.log.info(f"result.find('ERROR')!=-1 : {result.find('ERROR')!=-1}")
            ... skip ...

and find when catching file not found error. It would not start with "ERROR" but contains "ERROR". Weird😐

KartikSoneji commented 3 years ago

I think your SQL injection claim doesn't fit in an environment where you already allow the user to run arbitrary SQL/Python/etc code with almost no security checks when it comes to what users can write in their own notebooks.

Of course, here security is not as much of an issue, but what happens if the file is named ' a.csv? or if the table name is set to ; a.csv?
Trying to escape these edge cases in Python will lead to an imperfect re-implementation of the escaping logic like the original connector. I think it is easier and better to use substitution parameters.

Official Python connector didn't make much sense for me, it was much easier to just forward commands to the classical client. I will listen for arguments though and weigh whether it's worth switching to something like official mariadb python connector or community pymysql.

Listening for the MariaDB [] prompt feels like a fragile workaround, especially when the Python connector is both officialy supported and throughly tested.
As an added benefit, we don't have to worry about implementing parameter substitution.

https://github.com/MariaDB/mariadb_kernel/blob/0dddfe5cd8c3c9ad80687fed05401cc5313e6d86/mariadb_kernel/mariadb_client.py#L50

Also, some queries might trip up the prompt detection, for example:

SELECT "MariaDB []>";
+-------------+
| MariaDB []> |
+-------------+
| MariaDB []> |
+-------------+
1 row in set (0.000 sec)
robertbindar commented 3 years ago

I test for it. And I find sometimes the MariaDBClient.run_statement(...) does not catch the ERROR. like the video below. https://www.youtube.com/watch?v=bJT23YPviCU&feature=youtu.be&ab_channel=xing-zhijiang I add some log info to help understand what happens. like code below

    def run_statement(self, code, timeout=-1):
        if not code:
            return ""

        result = ""
        # TODO: double check exception handling
        try:
            result = self.maria_repl.run_command(code, timeout)
            self.log.info(f"result's length : {len(result)}")
            self.log.info(f"result.startswith('ERROR') : {result.startswith('ERROR')}")
            self.log.info(f"result.find('ERROR')!=-1 : {result.find('ERROR')!=-1}")
            ... skip ...

and find when catching file not found error. It would not start with "ERROR" but contains "ERROR". Weird😐

@a97410985 There was Issue #4 causing your problems. I pushed a commit disabling progress reports, things should work as expected now.

robertbindar commented 3 years ago

I think your SQL injection claim doesn't fit in an environment where you already allow the user to run arbitrary SQL/Python/etc code with almost no security checks when it comes to what users can write in their own notebooks.

Of course, here security is not as much of an issue, but what happens if the file is named ' a.csv? or if the table name is set to ; a.csv? Trying to escape these edge cases in Python will lead to an imperfect re-implementation of the escaping logic like the original connector. I think it is easier and better to use substitution parameters.

Official Python connector didn't make much sense for me, it was much easier to just forward commands to the classical client. I will listen for arguments though and weigh whether it's worth switching to something like official mariadb python connector or community pymysql.

Listening for the MariaDB [] prompt feels like a fragile workaround, especially when the Python connector is both officialy supported and throughly tested. As an added benefit, we don't have to worry about implementing parameter substitution.

https://github.com/MariaDB/mariadb_kernel/blob/0dddfe5cd8c3c9ad80687fed05401cc5313e6d86/mariadb_kernel/mariadb_client.py#L50

Also, some queries might trip up the prompt detection, for example:

SELECT "MariaDB []>";
+-------------+
| MariaDB []> |
+-------------+
| MariaDB []> |
+-------------+
1 row in set (0.000 sec)

I agree with you @KartikSoneji , python connector MariaDB client should arrive in the future. It should be simple to create an interface for MariaDBClient and derive both the current client and the pymysql one from the interface and allow the user to choose one or the other. But for me it is not an urgent priority at the moment, if you'd like to work on it and submit a PR, I'm happy to review it. Also I would appreciate if you could create an issue to track this.