MrTomRod / scoary-2

Calculate assocations between genes and traits
MIT License
19 stars 1 forks source link

Fix sensitivity/specificity for genes negatively correlated to trait. #7

Closed njohner closed 8 months ago

njohner commented 8 months ago

Hi Thomas,

I've started using Scoary2 recently and it's great, but I just noticed that your sensitivity and specificity values are wrong in the case of genes negatively correlated to a trait. Here is a proposed fix. Sadly I can't really update the tests, as there are failures all over the place, mainly because of missing results file you seem to use to compare to (e.g. FileNotFoundError: [Errno 2] No such file or directory: '../data/new_ds/LC.tsv').

Can you provide some help here? Thanks!

MrTomRod commented 8 months ago

Thanks for noticing this! Yes, my tests are bad, sorry. Didn't have time to clean that up.

I had to adapt your code:

    pos_sensitivity = (result_df['g+t+'] / n_pos * 100) if n_pos else 0 
    pos_specificity = (result_df['g-t-'] / n_neg * 100) if n_neg else 0
    pos_average = (pos_sensitivity + pos_specificity) / 2
    neg_sensitivity = (result_df['g-t+'] / n_pos * 100) if n_pos else 0
    neg_specificity = (result_df['g+t-'] / n_neg * 100) if n_neg else 0
    neg_average = (neg_sensitivity + neg_specificity) / 2
    keep_pos = pos_average > neg_average  # Taking the sum rather than the average would be enough
    result_df["sensitivity"] = pos_sensitivity.where(keep_pos, neg_sensitivity)  # pos_sensitivity can be an integer
    result_df["specificity"] = pos_specificity.where(keep_pos, neg_specificity)  #pos_specificity can be an integer

My code:

    if n_pos:
        pos_sensitivity = (result_df['g+t+'] / n_pos * 100)
        neg_sensitivity = (result_df['g-t+'] / n_pos * 100)
    else:
        pos_sensitivity = neg_sensitivity = pd.Series(0, index=result_df.index)

    if n_neg:
        pos_specificity = (result_df['g-t-'] / n_neg * 100)
        neg_specificity = (result_df['g+t-'] / n_neg * 100)
    else:
        pos_specificity = neg_specificity = pd.Series(0, index=result_df.index)

    keep_pos = (pos_sensitivity + pos_specificity) > (neg_sensitivity + neg_specificity)
    result_df["sensitivity"] = pos_sensitivity.where(keep_pos, neg_sensitivity)
    result_df["specificity"] = pos_specificity.where(keep_pos, neg_specificity)

I tested it, it works. Pushed this as v0.0.15. Thanks again!

njohner commented 8 months ago

Perfect thanks! Yes I hadn't really tested

njohner commented 8 months ago

Perfect thanks!

MrTomRod commented 4 months ago

Dear @njohner

Could you do me a favour and check if what I wrote here makes sense? My brain hurts from thinking it through again.

njohner commented 4 months ago

Yes looks good!