Caochris / SCOTIA

MIT License
5 stars 3 forks source link

Function 'scotia.post_ot' returns different results #11

Closed lee-t closed 4 months ago

lee-t commented 5 months ago

The function scotia.post_ot returns a different avg likelihood every time its invoked which changes the results. I doubt this is the intended behavior but if it is, then we need a different doctest.

DOCTEST REPRODUCTION
CommandLine:
    python -m xdoctest scotia/ot.py post_ot:0
====== </exec> ======
============
Finished doctests
2 / 3 passed

=== Found 1 errors ===
--- Error: 1 / 1 ---
    * REASON: GotWantException
    DOCTEST DEBUG INFO
      XDoc "scotia/ot.py::post_ot:0", line 3 <- wrt doctest
      File "scotia/ot.py", line 175, <- wrt source file
    DOCTEST PART BREAKDOWN
    Passed Parts:
        1 >>> ot_results =  pd.read_csv('./input_files/ot_results.csv',index_col=0)
    Failed Part:
        2 >>> post_ot(ot_results,label='test').head(1)
    DOCTEST TRACEBACK
    Expected:
                                                label  ave_likelihood
            0  Dll4_Notch2|test|Erythroid_Erythroidpro        0.098149
    Got:
                                           label  ave_likelihood
        0  Kitl_Kit|test|Hepatocyte_Erythroidpro        0.144833
    Repr Difference:
        got  = '                                   label  ave_likelihood\n0  Kitl_Kit|test|Hepatocyte_Erythroidpro        0.144833'
        want = '                                        label  ave_likelihood\n    0  Dll4_Notch2|test|Erythroid_Erythroidpro        0.098149'
    DOCTEST REPRODUCTION
    CommandLine:
        python -m xdoctest scotia/ot.py post_ot:0

=== Failed tests ===
python -m xdoctest scotia/ot.py post_ot:0
=== 1 failed, 2 passed in 10.29 seconds ===
Caochris commented 5 months ago

The avg likelihoods are the same, the order of the 'label' may vary.

lee-t commented 5 months ago

The avg likelihoods are the same, the order of the 'label' may vary.

The order must be made consistent, perhaps descending by avg likelihood? Would only need to add the line df_sum.sort_values(by=['ave_likelihood'], ascending=False)

Also, might be a good time to check for N/A's and decide should they go first or last in the list.

Caochris commented 5 months ago

Thanks for this suggestion, and the values won't be N/A.