Samagra-Development / Text2SQL

20 stars 11 forks source link

Tests for Text2SQL #33

Open goyalpramod opened 1 year ago

goyalpramod commented 1 year ago

Tests are to be incorporated in the project for various edge cases. For example Incorrect spelling, ranking, nested queries, queries with join,merge etc (It is not an exhaustive list, please feel free to add tests that you believe capture the edge cases, this will lead to better to code coverage)

Here is a pseudo-code, to help you understand the current thinking ideology

Class TestTextToSQL():
  def set_up():
      # set up a schema
      schema = {
          "tables": {
              "students": {
                  "columns": ["id", "name", "grade"]
              },
              "classes": {
                  "columns": ["class_id", "class_name"]
              }
          }
      }

  def text_to_sql(NL_text,Schema_ID):
      # write the code to run the query with given natural language code and schema id
      pass

  def test_for_incorrect_spelling():
        # execute the test
        text_input = "Number of stnts in class 5?"
        generated_query = text_to_sql(schema, text_input)

        # validate the results
        expected_query = "SELECT COUNT(*) FROM students WHERE class_id = 5"
        assert generated_query == expected_query

  def deprecate_schema():
      # delete the schema once the tests are over 
      pass

if __name__ == "__main__":
    # run tests
    pass

For reference, one can look at the test setup for related_tables.py

to get a better understanding of the edge cases one can consult the following survey https://link.springer.com/article/10.1007/s00778-022-00776-8

feel free to publish your branch so everyone can contribute and add their own tests.

It is also vital to check the results instead of the queries, because for complex natural language. We can get the same result with various different queries.