Novel interaction notification

maxkfranz commented 4 years ago

Description

Q: What is the name of the feature?

A: Novel interaction notification

Q: What does this feature enable the user to do?

A: User X gets a notification when a novel interaction has been listed in Biofactoid and that interaction is relevant to User X's research.

Benefits:

Biofactoid is advertised to prospective users in a relevant (non-spammy) way. This helps to increase mindshare of Biofactoid and it helps to increase the number of users that create factoids.
We incorporate the idea of notifications (like Emmaa), but in an automatic way: The author does not need to explicitly sign up for the notification -- sign ups create friction. The author also does not need to know exactly what the interaction would be a priori (e.g. using Emmaa, an author has to specify the exact participants, and an author wouldn't know what the participants would be if the interaction is novel).

Q: What information must the user provide to use the feature?

A: A different author, User A, creates a factoid with a novel interaction. User X shouldn't have to provide any information.

Q: What are the applicable constraints, e.g. compatibility or performance?

A:

We need to have a way to contact User X, even though they may have not used Biofactoid before.
We need many users, like User A, to create novel factoids.

Q: How does this feature affect each class of user (persona)?

A:

Biologist: Learns about new interactions that are relevant to their research. Learns about Biofactoid organically.
Editor: Perhaps may give an opportunity to advertise their journal (i.e. the novel interaction came from a paper in their journal).
Computational biologist: N/A
Curator: N/A

Specification

Mockup

(Email to User X)

Biofactoid has found a novel interaction that we have determined to be relevant to your research using Biofactoid's advanced AI. Biofactoid is an app that allows authors to create a digital profile of scientific discoveries in an article and connects it to related research.

You can see the new interaction here: https://biofactoid.org/document/{DOCUMENT_ID}

You can connect your own findings with other researchers by adding your articles to Biofactoid. Get started at biofactoid.org!

Details

When User A submits a new factoid, get the list of related articles (as normal).
Determine whether the article contains a novel interaction, using Indra (i.e. interaction query yields no results). If the document does not contain any novel interactions, then bail out -- no emails are sent.
Flag the related articles with a high correlation score (via Semantic Search).
For each flagged article:
- Get the corresponding author email (User X) from the Pubmed metadata.
- Send an email template (see mockup) using Mailjet to the corresponding author.
- Bonus: Do a search using the Pubmed API to see whether the corresponding author has published any factoidable articles recently. If so, include a nudge in the email template along these lines: We noticed that you've recently published 'Lorem ipsum dolor sit amet' in Some Journal. We've created an article summary for you. Simply click here to start adding your article's interactions!

maxkfranz commented 4 years ago

@jvwong, we should iterate on the email template mockup a bit.

maxkfranz commented 4 years ago

@cannin, this would be a great motivation to work on creating a more accurate script that can flag articles that are 'factoidable'. See the 'bonus' item under 'Details'.

jvwong commented 4 years ago

@cannin, this would be a great motivation to work on creating a more accurate script that can flag articles that are 'factoidable'. See the 'bonus' item under 'Details'.

My todo list includes creating a mini-test set of PMIDs of which a subset I flagged as hits. Mainly from Mol Cell, Cell reports (~200 hits for ? total)

maxkfranz commented 3 years ago

Adding @JohnGiorgi into the loop re. the 'factoidable' article detection

JohnGiorgi commented 3 years ago

@maxkfranz Cool!

I started prototyping something here. It uses AutoML to build/train a classifier for predicting "factoidable"/"not factoidable" based on @jvwong labelled data.

~~Accuracy is 100% on train set and 84% on the held-out test set, but @jvwong found the precision is actually much lower.~~ The test set data is not balanced so accuracy is not appropriate.

A very informative experiement would be to train the model on fractions of our availble training data (e.g. 25%, 50%, 75%, 100%) and plot the performance on the held-out test set. I suspect we have far to little labelled data to take advantage of AutoML right now. I can try to get to this experiement this week.

JohnGiorgi commented 3 years ago

Okay here is that plot:

So performance on the train and validation set improves as more data is used to train the system. Weird that performance hits 100% on the validation set. Ill chalk this up to the fact that it is tiny (~10 examples).

Either way, this motivates collecting more labelled data if you want to go the AutoML route.

gbader commented 3 years ago

Hi - there are many existing training data that we could use that I'm sure would be roughly equivalent to the types of papers classified by Jeff. So we shouldn't need to create our own. FYI, we made something like this for a similar purpose, for protein interaction papers in 2003 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-4-11

JohnGiorgi commented 3 years ago

@gbader I didn't think about using PPI IE data. Does that study have abstracts with binary labels? (contains PPI or doesn't). Is there somewhere I can access it?

maxkfranz commented 3 years ago

@jvwong, would you update the unstable instance's environment variables so that it sends out the emails to the support address?

jvwong commented 3 years ago

@jvwong, would you update the unstable instance's environment variables so that it sends out the emails to the support address?

done.

jvwong commented 3 years ago

TODOs
- [ ] Allow access to author name parts individually
- [ ] Consider rank and filter (reviews, date)
- [ ] Consider rate-limiting the emails

maxkfranz commented 3 years ago

Added date filtering and rate-limiting to the todos.

maxkfranz commented 3 years ago

Re. rate limiting: https://github.com/sindresorhus/p-throttle

jvwong commented 3 years ago

Re Rate limiting, I think that the ranking and filtering does a decent job of cutting down from what I've seen. Also the limits for MailJet are per hour. So maybe punt until we see it being a problem.

maxkfranz commented 3 years ago

Another filter would be (if we’re not already):

Intersect the list of papers (from refs) with the related papers (30) shortlist. That’s a simple way to enforce that the email papers have a high score.

On Nov 12, 2020, at 11:40, Jeffrey notifications@github.com wrote:

Re Rate limiting, I think that the ranking and filtering does a decent job of cutting down from what I've seen. Also the limits for MailJet are per hour. So maybe punt until we see it being a problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

maxkfranz commented 3 years ago

Re. rate limiting, I just want to have a concrete plan in case things go sideways. Let’s see how the numbers work in practice before we take action

On Nov 12, 2020, at 11:40, Jeffrey notifications@github.com wrote:

Re Rate limiting, I think that the ranking and filtering does a decent job of cutting down from what I've seen. Also the limits for MailJet are per hour. So maybe punt until we see it being a problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

maxkfranz commented 3 years ago

Closing. Enhancement such as one-click editor links can be assigned to new issues

PathwayCommons / factoid