Violet969 / PLM_Sol

Protein solubility prediction tools
0 stars 0 forks source link

UESolDS dataset #1

Open zchwang opened 3 weeks ago

zchwang commented 3 weeks ago

Hi,

Nice work. The number of training samples in the embedding_dataset directory seems to be fewer than reported in the paper (32,053 Insol vs. 47,291 Sol). Is the full dataset available? Additionally, what do the two labels A-1 and A-0 refer to?

Best regards,

Violet969 commented 3 weeks ago

Hi, the UESolDS dataset includes a training set, validation set, and test set, so the size of the training set needs to be reduced to account for the validation and test sets. And we updated the UESolDS dataset during the paper review process. Additionally, the two labels are A-1 for soluble proteins and A-0 for insoluble proteins.