OHDSI / Achilles

Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) - descriptive statistics about a OMOP CDM database
https://ohdsi.github.io/Achilles/
130 stars 121 forks source link

Achilles benchmarking results strip time-unit from execution duration (and mixing units) #701

Closed MaximMoinat closed 1 year ago

MaximMoinat commented 1 year ago

Describe the bug When extracting analysis execution duration, Achilles result benchmark strips the unit and only stores the value in the database. The log has a formatted execution duration, e.g. automatically displaying in seconds, minutes, hours whichever is more applicable. This information is stripped, so there is no way of knowing what the value is. See screenshot below. To confirm: all values are between 0-60, so definitely not all seconds.

image

Note that in older Achilles versions the unit WAS kept in the database. I have narrowed down the issue being introduced in December 2021, in a series of commits ending with: https://github.com/OHDSI/Achilles/commit/d81fae98d83f623ab1e901a56827cb96a67cc75f

Solution direction There are two possible solutions:

MaximMoinat commented 1 year ago

@fdefalco Would you be able to take a look? This is a high priority issue for EHDEN. Happy to provide a solution or test something.

fdefalco commented 1 year ago

I thought that we were standardized to outputting seconds only. I would prefer to keep just a numeric output for a variety of reasons as other downstream use of this data gets complicated if you have to do conversions and strip out text strings. To that end, if we aren't currently standardized to seconds, we should implement that and stay without the text string in the output itself.

AnthonyMolinaro commented 1 year ago

@fdefalco Thanks for clarifying, Frank.

@MaximMoinat Thanks very much for catching this! I believe we have a few analyses that are taking a long time, but we didn't realize it because of the missing unit. A slight modification to what's in the PR gives us the time in seconds without adding the unit as a string. Please have a look. From examining the log of a recent Achilles run, I noticed that 401 was in seconds, 402 in minutes, and 117 in hours!

# sec, min, hours
# 401,402,117
analysisId <- 401
logRow <- logs[logs$analysisId == analysisId, ]
if (nrow(logRow) == 1) {
    runtTimevalue <- round(as.numeric(strsplit(logRow[1, ]$runTime, " ")[[1]][1]), 2)
    runTimeUnit <- strsplit(logRow[1, ]$runTime, " ")[[1]][2]
    if (runTimeUnit == "mins") {
      runtTimevalue <- runtTimevalue * 60
    } else if (runTimeUnit == "hours") {
      runtTimevalue <- runtTimevalue * 60 * 60
    } else if (runTimeUnit == "days") {
      runtTimevalue <- runtTimevalue * 60 * 60 * 24
    }
}
runtTimevalue
runTimeUnit
MaximMoinat commented 1 year ago

Thanks both, I will update the pull request with the given conversion of unit.

AnthonyMolinaro commented 1 year ago

@MaximMoinat Nice work on implementing this fix. I noticed that I posted a small typo: "runtTimevalue", which should be runTimeValue. I pulled your branch, fixed the typo, and then ran it on my end. I just pushed the update to the branch. I noticed that the number 6 is inserted in the count_value field. Does this occur on your end as well? Here's the query I used to verify:

select analysis_id-2000000 analysis_id, stratum_1,count_value
from achilles_results
where analysis_id > 3000;
MaximMoinat commented 1 year ago

Thanks @AnthonyMolinaro. The count_value is set as smallCellCount + 1. This is how it has been before, I did not change the behavior.

AnthonyMolinaro commented 1 year ago

@MaximMoinat Thanks. Yes, for a previous run I must have set the small cell count to 0, which is why I saw 1 in the database I was testing on.

AnthonyMolinaro commented 1 year ago

@MaximMoinat Your fix was merged into the develop branch. Does that work for you or do you need to pull from Main?

MaximMoinat commented 1 year ago

Thanks Anthony. Ideally it would be part of a new release. Our data partners from EHDEN are pulling the latest release, and we can point people to the new release when (re)running Achilles.

MaximMoinat commented 1 year ago

@AnthonyMolinaro When is a new Achilles release planned?

fdefalco commented 1 year ago

I'm planning on releasing as soon as I can confirm functionality required for the SOS OHDSI challenge. Should be within a week.

MaximMoinat commented 1 year ago

@fdefalco Sorry for getting back to this, but any progress on the next Achilles release? This issue is somewhat important for analysing the performance in the EHDEN network, so we would be very much helped if this could be prioritised.

fdefalco commented 1 year ago

Yes, had a few CRAN related hoops to jump through, release went out today.