Closed MaximMoinat closed 1 year ago
@fdefalco Would you be able to take a look? This is a high priority issue for EHDEN. Happy to provide a solution or test something.
I thought that we were standardized to outputting seconds only. I would prefer to keep just a numeric output for a variety of reasons as other downstream use of this data gets complicated if you have to do conversions and strip out text strings. To that end, if we aren't currently standardized to seconds, we should implement that and stay without the text string in the output itself.
@fdefalco Thanks for clarifying, Frank.
@MaximMoinat Thanks very much for catching this! I believe we have a few analyses that are taking a long time, but we didn't realize it because of the missing unit. A slight modification to what's in the PR gives us the time in seconds without adding the unit as a string. Please have a look. From examining the log of a recent Achilles run, I noticed that 401 was in seconds, 402 in minutes, and 117 in hours!
# sec, min, hours
# 401,402,117
analysisId <- 401
logRow <- logs[logs$analysisId == analysisId, ]
if (nrow(logRow) == 1) {
runtTimevalue <- round(as.numeric(strsplit(logRow[1, ]$runTime, " ")[[1]][1]), 2)
runTimeUnit <- strsplit(logRow[1, ]$runTime, " ")[[1]][2]
if (runTimeUnit == "mins") {
runtTimevalue <- runtTimevalue * 60
} else if (runTimeUnit == "hours") {
runtTimevalue <- runtTimevalue * 60 * 60
} else if (runTimeUnit == "days") {
runtTimevalue <- runtTimevalue * 60 * 60 * 24
}
}
runtTimevalue
runTimeUnit
Thanks both, I will update the pull request with the given conversion of unit.
@MaximMoinat Nice work on implementing this fix. I noticed that I posted a small typo: "runtTimevalue", which should be runTimeValue. I pulled your branch, fixed the typo, and then ran it on my end. I just pushed the update to the branch. I noticed that the number 6 is inserted in the count_value field. Does this occur on your end as well? Here's the query I used to verify:
select analysis_id-2000000 analysis_id, stratum_1,count_value
from achilles_results
where analysis_id > 3000;
Thanks @AnthonyMolinaro. The count_value
is set as smallCellCount + 1
. This is how it has been before, I did not change the behavior.
@MaximMoinat Thanks. Yes, for a previous run I must have set the small cell count to 0, which is why I saw 1 in the database I was testing on.
@MaximMoinat Your fix was merged into the develop branch. Does that work for you or do you need to pull from Main?
Thanks Anthony. Ideally it would be part of a new release. Our data partners from EHDEN are pulling the latest release, and we can point people to the new release when (re)running Achilles.
@AnthonyMolinaro When is a new Achilles release planned?
I'm planning on releasing as soon as I can confirm functionality required for the SOS OHDSI challenge. Should be within a week.
@fdefalco Sorry for getting back to this, but any progress on the next Achilles release? This issue is somewhat important for analysing the performance in the EHDEN network, so we would be very much helped if this could be prioritised.
Yes, had a few CRAN related hoops to jump through, release went out today.
Describe the bug When extracting analysis execution duration, Achilles result benchmark strips the unit and only stores the value in the database. The log has a formatted execution duration, e.g. automatically displaying in seconds, minutes, hours whichever is more applicable. This information is stripped, so there is no way of knowing what the value is. See screenshot below. To confirm: all values are between 0-60, so definitely not all seconds.
Note that in older Achilles versions the unit WAS kept in the database. I have narrowed down the issue being introduced in December 2021, in a series of commits ending with: https://github.com/OHDSI/Achilles/commit/d81fae98d83f623ab1e901a56827cb96a67cc75f
Solution direction There are two possible solutions: