Increase robustness of calculate_phase_count

ESHackathon / CiteSource

http://www.eshackathon.org/CiteSource/

GNU General Public License v3.0

16 stars 2 forks source link

Increase robustness of calculate_phase_count #155

Closed LukasWallrich closed 1 year ago

LukasWallrich commented 1 year ago

Document that phases should be called screened and final
Allow for any capitalisation (tolower(cite_label) == "screened") etc)
Warn if data does not contain screened and final labels

TNRiley commented 1 year ago

completed

LukasWallrich commented 1 year ago

One addition: can the function work when only final records are identified? Then the screened columns should be dropped - often that seems to be the case, e.g., then comparing search results against a benchmark set, or when only raw and final data are available? Or does this already work?

TNRiley commented 1 year ago

Good call, I can work to strip the screened info if only the final label is applied. In terms of benchmarking, you shouldn't need a label, just to name the source as benchmark.

TNRiley commented 1 year ago

I decided to just make changes to the precision_sensitivity_table, instead of both the count function and the table. The table will now check to see if the screened column is all 0's, if they are, it will filter that column from the table. Checked with data that both includes and does not include screened data and everything looks good.