apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.69k stars 3.28k forks source link

[Enhancement] Add more schema change regression cases to enhance the quality of schema change #35683

Open Lchangliang opened 5 months ago

Lchangliang commented 5 months ago

Search before asking

Description

We need to test schema change at different latitudes to make sure it is correct.The following will provide some functional tests for interested students to implement. By implementing these functional tests, you can increase your understanding of doris and contribute to the doris community! If you want to pick the cases,the format example is [One-dimensional][table][agg][col][1]. If you finish the cases, you need to add pr-link after the comment, such as [One-dimensional][table][agg][col][1] pr-link.

You can refer to pr https://github.com/apache/doris/pull/34717/files.

Solution

One-dimensional test

table dimensional

agg

col

  1. Tests to delete and add columns of the same name
  2. Tests to change the type of non-key columns - agg_type needs to be specified

type

  1. Converts the tinyint type to [bigint, largeint, float, double, decimalv3, varchar, string]
  2. Converts the smallint type to [largeint, float, double, decimalv3, varchar, string]
  3. Converts the int type to [float, double, decimalv3, varchar, string]
  4. Converts the bigint type to [double, decimalv3, varchar, string]
  5. Converts the largeint type to [decimalv3, varchar, string]
  6. Converts the double type to [decimalv3, string]
  7. Converts the decimalv3 type to [decimalv3 with greater precision, string]
  8. Converts the date type to [datetime, datev2, datetimev2, string]
  9. Converts the datetime type to [date, datev2, datetimev2, string]
  10. Converts the datev2 type to [datetimev2, string]
  11. Converts the char type to [boolean, tinyint, smallint, int, bigint, largeint, float, double, char with longer length, varchar with longer length, string]
  12. The text(string) type cannot convert to other type

unique (mor and mow)

col

  1. Test columns of array type cannot be used as keys
  2. Test the nullable key cannot be modified

type

  1. Converts the date type to [datetime, datev2, datetimev2, string]
  2. Converts the datetime type to [date, datev2, datetimev2, string]
  3. Converts the datev2 type to [datetimev2, string]
  4. Converts the char type to [boolean, tinyint, smallint, int, bigint, largeint, float, double, char with longer length, varchar with longer length, string]
  5. The text(string) type cannot convert to other type

dup

col

  1. Paritition columns cannot be deleted
  2. Paritition columns cannot be modified

type

  1. The boolean type cannot convert to other type
  2. Converts the tinyint type to [bigint, largeint, float, double, decimalv3, varchar, string]
  3. Converts the smallint type to [largeint, float, double, decimalv3, varchar, string]
  4. Converts the int type to [float, double, decimalv3, varchar, string]
  5. Converts the bigint type to [double, decimalv3, varchar, string]
  6. Converts the largeint type to [decimalv3, varchar, string]
  7. Converts the double type to [decimalv3, string]
  8. Converts the decimalv3 type to [decimalv3 with greater precision, string]
  9. Converts the date type to [datetime, datev2, datetimev2, string]
  10. Converts the datetime type to [date, datev2, datetimev2, string]
  11. Converts the datev2 type to [datetimev2, string]
  12. Converts the char type to [boolean, tinyint, smallint, int, bigint, largeint, float, double, char with longer length, varchar with longer length, string]
  13. The text(string) type cannot convert to other type

property dimensional

table

  1. Test whether modify the dynamic_partition or not
  2. Test whether modify the properties [distribution_type, batch_delete, function_column.sequence_type, disable_storage_row_cache, driver, light_schema_change, colocate_with, replication_num, storage policy] or not.
  3. Test add/drop index [ngram bloom filter, inverted index]
  4. Test modify tablet/column annotation.
  5. Test rename the table name.

partition

  1. auto partition. Test the correctness that different schemas in different partition. For example, create a table partitions by mins. Insert some datas into first partition and wait to create second partition and than add/drop column and insert some datas into second partition, finally read it.
  2. Test whether modify the properties [partition_desc, bucket number, bucket key, distribution_desc, default partition, storage_medium, storage_cooldown_time, replication_num] or not.

Two-dimensional test

Query

  1. Tests do scheam change [add/drop key/value, modify column type, modify column length] during query
  2. Tests add/drop/modify index during query
  3. Tests modify table properties during query
  4. Tests modify partition properties during query

Analyze

  1. Test auto/Manual analyze Before and after schema change.

Are you willing to submit PR?

Code of Conduct

Lchangliang commented 4 months ago

Two-dimensional test template: Because schema change is inherently asynchronous, a serial implementation is chosen. For example, https://github.com/apache/doris/pull/27112. We can also do this with asynchronous threads. groovy's asynchronous example is shown below

    def thread1 = Thread.start {
        sleep(600000)
        for (int m = 0; m < 20; m++) {
            big_base_query = queries_list.get(random.nextInt(queries_list.size()))
            big_query_job(big_base_query)
        }
    }

    def thread2 = Thread.start {
        def load_times = 100
        def load_threads = 250
        for (int i = 0; i < load_times; i++) {
            def threads = []
            for (int j = 1; j < load_threads; j++) {
                def idx = i * load_threads + j
                def formattedNumber = String.format("%06d", idx)
                threads.add(Thread.start {
                    stream_load_job(table_name, formattedNumber)
                })

            }
            threads.each { it.join() }
        }
    }

    thread1.join()
    thread2.join()