UBC-MDS / fixml

Checklists and LLM prompts for efficient and effective test creation in data analysis
https://ubc-mds.github.io/fixml
Other
3 stars 2 forks source link

Human Evaluation for Accuracy - Qlib Repo #130

Closed tonyshumlh closed 4 weeks ago

tonyshumlh commented 1 month ago

This ticket serves as a Human Expert Evaluation Report on the ML project Qlib. URL: https://github.com/microsoft/qlib Each comment will contain the details

Below is the Evaluation summary.

Summary

Completeness Score: 5.0/7

Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 0.5 31
3.2 Data in the Expected Format 1 31
3.5 Check for Duplicate Records in Data 0 31
4.2 Verify Data Split Proportion 0.5 31
5.3 Ensure Model Output Shape Aligns with Expectation 1 31
6.1 Verify Evaluation Metrics Implementation 1 31
6.2 Evaluate Model's Performance Against Thresholds 1 31

Details

2.1 Ensure Data File Loads as Expected

Requirement: Ensure that data-loading functions correctly fetch datasets from predefined sources or online repositories. Additionally, verify that the functions handle errors or edge cases gracefully.

Observations: Partially Satisfied

Function References:

3.2 Data in the Expected Format

Requirement: Verify that the data matches the expected format. This involves checking the shape, data types, values, and any other properties.

Observations: Satisfied

Function References:

3.5 Check for Duplicate Records in Data

Requirement: Verify that there are no duplicate records in the loaded data.

Observations: Not Satisfied

Function References:

4.2 Verify Data Split Proportion

Requirement: Check that the data is split into training and testing sets in the expected proportion. Verify the split by checking the actual fraction of data points in the training and test sets.

Observations: Partially Satisfied

Function References:

5.3 Ensure Model Output Shape Aligns with Expectation

Requirement: Ensure that the structure of the model's output matches the expected format based on the task, such as checking the dimensions of the output versus the number of labels in classification task.

Observations: Satisfied

Function References:

6.1 Verify Evaluation Metrics Implementation

Requirement: Verify that the evaluation metrics are correctly implemented and appropriate for the model's task. Verify the metric computations with expected values to validate correctness.

Observations: Satisfied

Function References:

6.2 Evaluate Model's Performance Against Thresholds

Requirement: Compute evaluation metrics for both the training and testing datasets. Verify that these metrics exceed threshold values, indicating acceptable model performance.

Observations: Satisfied

Function References:

jinyz8888 commented 1 month ago

tests/test_dump_data.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 0 1
3.2 Data in the Expected Format 1 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 0 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

jinyz8888 commented 1 month ago

tests/test_get_data.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 1 1
3.2 Data in the Expected Format 1 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 0 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

jinyz8888 commented 1 month ago

tests/dataset_tests/test_datalayer.py Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 0 1
3.2 Data in the Expected Format 1 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 0 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

jinyz8888 commented 1 month ago

tests/dependency_tests/test_mlflow.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 0 1
3.2 Data in the Expected Format 0 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 0 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

Note: This script verifies that creating a MlflowClient instance is fast, specifically that it can be done in less than 10 milliseconds. This unit test is out of the scope of checklist.

jinyz8888 commented 1 month ago

tests/data_mid_layer_tests/test_dataset.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 1 1
3.2 Data in the Expected Format 1 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 1 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 0 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

jinyz8888 commented 1 month ago

tests/data_mid_layer_tests/test_handler.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 1 1
3.2 Data in the Expected Format 1 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 0 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

jinyz8888 commented 1 month ago

tests/data_mid_layer_tests/test_handler_storage.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 1 1
3.2 Data in the Expected Format 1 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 0 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

tonyshumlh commented 1 month ago
ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 0.5/1 10
3.2 Data in the Expected Format 1 10
3.5 Check for Duplicate Records in Data 0 10
4.2 Verify Data Split Proportion 0/0.5 10
5.3 Ensure Model Output Shape Aligns with Expectation 1 10
6.1 Verify Evaluation Metrics Implementation 1 10
6.2 Evaluate Model's Performance Against Thresholds 1 10

Details

2.1 Ensure Data File Loads as Expected

Requirement: Ensure that data-loading functions correctly fetch datasets from predefined sources or online repositories. Additionally, verify that the functions handle errors or edge cases gracefully.

Observations:

Function References:

3.2 Data in the Expected Format

Requirement: Verify that the data matches the expected format. This involves checking the shape, data types, values, and any other properties.

Observations:

Function References:

3.5 Check for Duplicate Records in Data

Requirement: Verify that there are no duplicate records in the loaded data.

Observations:

Function References:

4.2 Verify Data Split Proportion

Requirement: Check that the data is split into training and testing sets in the expected proportion. Verify the split by checking the actual fraction of data points in the training and test sets.

Observations:

Function References:

5.3 Ensure Model Output Shape Aligns with Expectation

Requirement: Ensure that the structure of the model's output matches the expected format based on the task, such as checking the dimensions of the output versus the number of labels in classification task.

Observations:

Function References:

6.1 Verify Evaluation Metrics Implementation

Requirement: Verify that the evaluation metrics are correctly implemented and appropriate for the model's task. Verify the metric computations with expected values to validate correctness.

Observations:

Function References:

6.2 Evaluate Model's Performance Against Thresholds

Requirement: Compute evaluation metrics for both the training and testing datasets. Verify that these metrics exceed threshold values, indicating acceptable model performance.

Observations:

Function References:

jinyz8888 commented 1 month ago

tests/backtest/test_file_strategy.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 0.5 1
3.2 Data in the Expected Format 0 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 1 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

jinyz8888 commented 1 month ago

tests/backtest/test_high_freq_trading.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 0 1
3.2 Data in the Expected Format 0 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 1 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds

jinyz8888 commented 1 month ago

tests/data_mid_layer_tests/test_processor.py: Completeness Score per Checklist Item:

ID Title is_Satisfied n_files_tested
2.1 Ensure Data File Loads as Expected 1 1
3.2 Data in the Expected Format 1 1
3.5 Check for Duplicate Records in Data 0 1
4.2 Verify Data Split Proportion 0 1
5.3 Ensure Model Output Shape Aligns with Expectation 0 1
6.1 Verify Evaluation Metrics Implementation 0 1
6.2 Evaluate Model's Performance Against Thresholds 0 1

Explanation of Checklist Items and Observations

2.1 Ensure Data File Loads as Expected

3.2 Data in the Expected Format

3.5 Check for Duplicate Records in Data

4.2 Verify Data Split Proportion

5.3 Ensure Model Output Shape Aligns with Expectation

6.1 Verify Evaluation Metrics Implementation

6.2 Evaluate Model's Performance Against Thresholds