RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

edge probabilities displayed in the UI have too many significant digits #689

Closed saramsey closed 3 years ago

saramsey commented 4 years ago
Screen Shot 2020-03-16 at 12 32 08 PM
edeutsch commented 4 years ago

These are just the numbers as presented in the JSON. I would suggest that the GUI has no idea what precision is appropriate. But rather the code that is computing those numbers has the best chance of understanding what precision is appropriate and should round accordingly. I suspect if that code rounds appropriately and puts that in the JSON, it will automatically be displayed right on the screen.

If you push back, we can certainly do this in the presentation layer, but that doesn't seem like the best place to put it.

Are there times when both 0.94 and 0.9998 might be appropriate?

dkoslicki commented 4 years ago

Yeah, these are just the numbers that @finnagin ML model spits out. IMHO, 5 significant digits is sufficient. Can push a fix if urgent.

saramsey commented 4 years ago

@edeutsch wrote:

I would suggest that the GUI has no idea what precision is appropriate.

I respectfully disagree. In a GUI meant for a human, we (the users) probably want to see no more than three significant digits. Cases where I want to see more digits than that are relatively rare for this kind of statistical summarization/estimation. (Yes I am aware that in other disciplines there are measurements like speed of light where we care about precision to 10 digits). We can have the precision configurable in the UI if someone really wants more (Even better -- just default to three, and have a "click to show all digits".). I think it would be sub-optimal if the reasoning code pre-truncated its significance estimates based on an assumed number of desired significant digits.

I assume we are all on the same page that when I write "significant digits", I mean like this:

I realize that reasonable people can disagree on this. But having the reasoning layer pre-truncate digits is imposing information loss, which goes against a basic principle that we don't needlessly throw away information prematurely. The GUI is not the main consumer of these estimates. If we feel that we must convey precision (and that is a reasonable point of view), I think the appropriate thing to do is to provide a separate edge attribute that numerically conveys precision.

AFAIK, all the various R packages for statistical estimation do not truncate their estimates based on standard error. It's assumed that that will be done in the presentation layer.

edeutsch commented 4 years ago

okay, I'm happy to go along with this. @isbluis want to implement this?

isbluis commented 4 years ago

Sure thing.

Is there a full list of entities for which we should do this? e.g. Jaccard index...

dkoslicki commented 4 years ago

@isbluis edges that have attributes with the following names:

probability_drug_treats
jaccard_index
ngd
paired_concept_freq
observed_expected_ratio
chi_square
isbluis commented 4 years ago

Great, thanks @dkoslicki

saramsey commented 4 years ago

Can we just put the fix in devLM for the time being? I'm scared of something breaking in /beta, this close to the demo. (No disparagement implied; just asserting Murphy's Law).

isbluis commented 4 years ago

Sure thing, though I have a medical appointment in a few minutes and won't be able to do this until tonight.

saramsey commented 4 years ago

From my standpoint this is not urgent. LM, please take care of yourself.

isbluis commented 4 years ago

Ok, this is ready in devLM. ( -reload! )

It only works on the edge_attributes that @dkoslicki listed. Do we also want to do the same for other values, such as weight, confidence, etc?

edeutsch commented 4 years ago

Edge attributes look good to me! https://arax.rtx.ai/devLM/?m=1552

And for other attributes, I would say yes e.g. here: https://arax.rtx.ai/devLM/?m=1553

isbluis commented 4 years ago

Ok, I have added weight and confidence as well.

Let me know if it passes muster, and I'll push the changes (if desired)

edeutsch commented 4 years ago

3 significant figures it is! Looks fancy! image

push it!

isbluis commented 4 years ago

Ok, pushed to demo/

edeutsch commented 4 years ago

tested and rolled out to beta. Nothing like night-before code rollouts. Would be good to test the examples 1,2,3,4 again in the morning. Seems to be in good shape to me!

thanks!

isbluis commented 4 years ago

Hi @dkoslicki , @saramsey ,

@isbluis edges that have attributes with the following names:

probability_drug_treats
jaccard_index
ngd
paired_concept_freq
observed_expected_ratio
chi_square

It looks like ngd has been renamed to normalized_google_distance . Or perhaps a new metric? Either way, I suppose I should also add it to the list of values to truncate, but let me know otherwise. Sorry if I missed this.

Any others?

image

isbluis commented 3 years ago

Closing this unless there are other attributes that require truncation.

nb. normalized_google_distance was added to the list with commit e1251c83d92af16f808cbf391e4b298b2b9641b3