MicrosoftResearch / Azimuth

Machine Learning-Based Predictive Modelling of CRISPR/Cas9 guide efficiency
BSD 3-Clause "New" or "Revised" License
221 stars 89 forks source link

Negative guide scores #21

Closed vineetgopal closed 6 years ago

vineetgopal commented 6 years ago

Hello! We've noticed that occasionally, guides that we score will get a negative predicted score from azimuth. Here's an example:

> from azimuth.model_comparison import predict
> import numpy
> predict(numpy.array(['AACTGATTTCTGGCGTTTTCTTTCTGGCTC']), numpy.array([8905]), numpy.array([96]))
No model file specified, using V3_model_full
array([-0.04603427])

Empirically, it looks like negative scores are more likely to happen with peptide percentages closer to 100. However, from the Azimuth documentation (and the general understanding of CRISPR on-target scores, it looks like scores are expected to be between 0.0 and 1.0. Is this possibly a bug in the scoring system?

If it's helpful, this does not happen if the peptide percentage is set to 95 or less.

Thanks!

jlistgarten commented 6 years ago

Hi Vineet,

Although the training data were in the range 0.0 to 1.0, the final trained regression model we use can make predictions outside of this range. Still, we expect these to be somewhat rare. So no, there is no bug, and what you describe sounds reasonable. If it’s easier for your purposes, you could just set values to the closest part of the range 0.0-1.0 (i.e. negative values to 0, and values greater than 1.1 to 1.0).

Jennifer

From: Vineet Gopal [mailto:notifications@github.com] Sent: Wednesday, August 9, 2017 6:56 PM To: MicrosoftResearch/Azimuth Azimuth@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [MicrosoftResearch/Azimuth] Negative guide scores (#21)

Hello! We've noticed that occasionally, guides that we score will get a negative predicted score from azimuth. Here's an example:

from azimuth.model_comparison import predict

import numpy

predict(numpy.array(['AACTGATTTCTGGCGTTTTCTTTCTGGCTC']), numpy.array([8905]), numpy.array([96]))

No model file specified, using V3_model_full

array([-0.04603427])

Empirically, it looks like negative scores are more likely to happen with peptide percentages closer to 100. However, from the Azimuth documentation (and the general understanding of CRISPR on-target scores, it looks like scores are expected to be between 0.0 and 1.0. Is this possibly a bug in the scoring system?

If it's helpful, this does not happen if the peptide percentage is set to 95 or less.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoftResearch%2FAzimuth%2Fissues%2F21&data=02%7C01%7Cjennl%40microsoft.com%7C1f8bbce5c0634723dc5a08d4df79c354%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636379161450199448&sdata=7GPe3j%2FrVz8ctSDCJQifbh%2FHv05T4sf5zklFPlSn%2FFo%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFj_NTH5QSWz5O61TE0tOvN5q4STggSlks5sWjjngaJpZM4Oyve_&data=02%7C01%7Cjennl%40microsoft.com%7C1f8bbce5c0634723dc5a08d4df79c354%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636379161450199448&sdata=Wv32WLQnFrW%2B7OAtSWRJYxcZA1yJZ6BKIqF7abyx2t4%3D&reserved=0.

jjc2718 commented 6 years ago

The documentation has been updated to reflect Jennifer's response above.