farhat-lab / gentb-site

The genTB project, the Django site, variant calling and prediciton pipeline, and mapping pipeline with hooks to two ravens
https://gentb.hms.harvard.edu
Other
8 stars 11 forks source link

binarize predict heatmap based on thresholds #219

Closed mahafarhat closed 3 years ago

mahafarhat commented 3 years ago

@mgro will supply @doctormo with threshold list for RF and WDNN separately. Probability will only be give upon hover, otherwise we will add a legend saying R= red, S= blue.

mgro commented 3 years ago

Hi @doctormo - I had shared the thresholds in Slack - do you want me to upload them here, too, for your easier access?

mahafarhat commented 3 years ago

Yes please

Maha Sent from my phone

On Mar 26, 2021, at 8:59 AM, MGroschel @.***> wrote:

 Hi @doctormo - I had shared the thresholds in Slack - do you want me to upload them here, too, for your easier access?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mgro commented 3 years ago

GenTB-RF - equal or above the threshold means RESISTANT

0.082_ethambutol_threshold
0.22_isoniazid_threshold
0.001_pyrazinamide_threshold
0.002_rifampicin_threshold
0.6_amikacin_threshold
0.25_capreomycin_threshold
0.42_ciprofloxacin_threshold
0.32_ethionamide_threshold
0.63_kanamycin_threshold
0.41000000000000003_levofloxacin_threshold
0.33_ofloxacin_threshold
0.001_para-aminosalicylic_acid_threshold
0.047_streptomycin_threshold

WDNN - equal or below the threshold means RESISTANT

0.548_ethambutol_threshold
0.608_isoniazid_threshold
0.47600000000000003_pyrazinamide_threshold
0.59_rifampicin_threshold
0.45_amikacin_threshold
0.508_capreomycin_threshold
0.41600000000000004_kanamycin_threshold
0.452_ofloxacin_threshold
0.526_streptomycin_threshold
0.625_ciprofloxacin_threshold
mahafarhat commented 3 years ago

@mgro actually please apply the threshold internally in the json generating step.

@doctormo we are planning to change the json output as follows [ "/n/groups/gentb_www/predictData/tbdata_00000761/Italian1", "inh", "0.33", "4.95", "8.76" ],

to

[ "/n/groups/gentb_www/predictData/tbdata_00000761/Italian1", "inh", "0.33", "R", "4.95", "8.76" ],

where designation "R" is added because probability =0.33 > 0.22 (threshold for INH)

doctormo commented 3 years ago

This format change is frighteningly inconsistent, I can make it work, but adding things in the middle can cause issues, especially if you remove items later in a new format. Don't forget these formats aren't versioned or schema'd so they can't be controlled in code without checking some boundary (like the length of the given list).

mahafarhat commented 3 years ago

@doctormo thinking more about this, can we make this change instead: [[ "/n/groups/gentb_www/predictData/tbdata_00000761/Italian1", "inh", "1", "0.33", "4.95", "8.76" ], ...]

Then you would display DR prediction = 1 DR probability = 0.33, FP = 4.95, FN = 8.76 upon hover ?

Alternatively we can change to [[ "/n/groups/gentb_www/predictData/tbdata_00000761/Italian1", "inh", ["1", "0.33"], "4.95", "8.76" ], ...]

a third alternative id [[ "/n/groups/gentb_www/predictData/tbdata_00000761/Italian1", "inh", "1", "4.95", "8.76" ], ...] and include a new json list with the proabilities for each drug-strain?

mahafarhat commented 3 years ago

if you are worried about back compatibility we can email the users saying we are launching a new version and hence have to purge old predictions

mahafarhat commented 3 years ago

so we've settle on this 👍 [[ "/n/groups/gentb_www/predictData/tbdata_00000761/Italian1", "inh", "1", "4.95", "8.76", "0.33"], ...]

mgro commented 3 years ago

Hi @doctormo the *.matrix.json from the 2.2. and WDNN pipelines are now written in the format as agreed above, i.e., including the binary resistance call in the third position of each drug-list.

@mahafarhat could you do a last sanity check on the 2.2 pipeline - I think it'll make sense that we make the 2.2. the default at the moment where Martin changes the heatmap generation step to take in the new format (unless the heatmap script can handle both old and new drug-list formats?)