MyungjaeSong / Paired-Library

52 stars 23 forks source link

Strange predicted efficiency from CBE_Efficiency #16

Open zhxiaokang opened 1 year ago

zhxiaokang commented 1 year ago

Hi,

Thank you for the great tools! We are recently working on CBE and ran into DeepBaseEditor .

I tried using the pre-trained model CBE_Efficiency on HT_CBE_Test (Supplementary Table 2), but found the prediction quite strange (very different to the measured efficiency and I even got negative results). Here are some target sequences that I randomly picked out from HT_CBE_Test: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Target sequence | Measured CBE efficiency | Predicted efficiency score -- | -- | -- GACTGACCAGGAGAGCTCCCCCGG | 38.02% | 23.71 GCCCTCTCTCATGCAAGTGGGAGG | 30.30% | 26.30 TCATGCACCGAACTGCGGGGACGG | 24.42% | 19.05 CATCTGTCTCTTTCCTCCACAGGG | 20.30% | 20.41 AAAAGGCGCGGGGGTGATGGGAGG | 15.10% | 8.43 AAGAGGACTGTTCACACCAAGTGG | 15.05% | 13.88 GGGACAGAGGGAGGAGGCTCTAGG | 10.53% | 7.65 TGACCCATGAGACCCTGTACTTGG | 9.26% | 14.65 GCAGGCTGGTGGCGATGTTCTTGG | 6.08% | 4.08 CCTTTTCCGCTGGGTGTCACTCGG | 3.47% | 11.90 TTGAGGTGCACTAATAGAGGGTGG | 1.06% | 2.77 CTTGAGAGCTTTCATAAAGCTTGG | 0.08% | -1.35

The target sequence above has a length of 24 instead of 30 as in HT_CBE_Test from the Excel file (Supp Table2) because I noticed that the example target sequence CBE_Efficiency_sample.txt here has a length of 24. Please point me out if that was wrong.

Thank you in advance!

JohnTChambers commented 1 year ago

I have also noticed some strange predictions from CBE_Efficiency, including a very poor correlation of 0.05 (Pearson) with our own experimental data. I haven't had time to carefully check the results. But, I would be eager to hear anything you find out. I will be sure to add an update with the results from my own analysis in the next week.

I have also been formatting my input sequences in the form of 24-mers; 1-upstream + guide + PAM.