MingyuJ666 / ProLLM

[COLM'24] We propose Protein Chain of Thought (ProCoT), which replicates the biological mechanism of signaling pathways as language prompts. It considers a signaling pathway as a protein reasoning process, which starts from upstream proteins and passes through several intermediate proteins to transmit biological signals to downstream proteins.
49 stars 5 forks source link

How to evaluate the performance? #2

Open HHW-zhou opened 2 months ago

HHW-zhou commented 2 months ago

Thank you for your work. However, as far as I know, STRING-related tasks are typically multi-label, but the training data in your code appears to be single-label. Additionally, after using the provided script for splitting, it seems that some data in the test set is also included in the training set. Could you please explain how you evaluate the model and possibly provide the evaluation script?

tiuxuxsh76075 commented 2 months ago

Thank you for your feedback! We apologize for not clearly explaining some details on GitHub.

  1. Regarding the dataset labels: We confirm that the dataset is multi-label; however, the current version of the code we uploaded handles single-label tasks. The reason for choosing a single-label version is that, in certain biological contexts, single-label predictions can have unique significance. In such cases, single-label models can provide more straightforward answers that are easier to interpret and validate. Moreover, single-label prediction can reduce noise and improve accuracy, especially when multi-label tasks might introduce correlation or conflict between labels. Thus, we believe that single-label methods have biological relevance in specific experimental setups.

  2. Regarding the dataset splitting issue: Before generating the QA dataset, we had already split the PPI dataset. We then generated the ProCoT-QA on the pre-split dataset to avoid data leakage. Moving forward, we plan to implement a method to check for any potential data leakage to ensure the integrity of our results.

  3. Future updates: We plan to upload a multi-label version of the code and a complete evaluation script shortly. We will also update the README to provide detailed instructions on how to use these scripts to evaluate model performance. These updates will help users better understand and apply the model in multi-label tasks.

Thank you again for your attention and feedback!