Open kcho opened 1 year ago
def consecutive_duplicates(df):
# Boolean mask of 'series_order' consecutive duplicates
consecutive_duplicates = df['series_order'].eq(df['series_order'].shift())
# If there are any 'series_order' consecutive duplicates
if consecutive_duplicates.any():
# Indicate which 'series_order' have consecutive duplicates in new 'temporary' column
df['temporary'] = consecutive_duplicates
# Iterate through rows of dataframe
for row in df.iloc[1:].itertuples(): # .iloc[1:] required because first row contains float
if row.temporary: # If 'series_order' consecutive duplicate indicated in 'temporary'
# Insert 'Consecutive duplicate detected' cell at index
df['series_order_target'] = pd.concat([
df['series_order_target'].iloc[:row.Index], # Values up to `row.Index`
pd.Series(['Consecutive duplicate detected']), # New Series object with message
df['series_order_target'].iloc[row.Index:] # Values beyond `row.Index`
]).reset_index(drop=True)
# Drop 'temporary' column
df.drop('temporary', axis=1, inplace=True)
# Iterate through rows of dataframe
for row in df.iloc[1:].itertuples(): # .iloc[1:] required because first row contains float
df.at[row.Index, 'order_diff'] = row.series_order_target == row.series_order # Update 'order_diff'
return df```
In the above code, if consecutive duplicate series are detected in 'series_order', a message cell will be added to 'series_order_target' at the index of the consecutive duplicate series (realigning series below it to the standard template), and 'order_diff' will be updated.
@kcho Please let me know if you have any thoughts. I will hold off now for integrating this patch into the repository, as you know best where it should go.
@nickckim Great work. Could you creat a new branch and add this function to qqc/qqc/dicom.py
? Your function could be place right before line 246 to take in series_order_df_all
as input and return updated series_order_df_all
. I'll test your function in the new branch once you create the PR for this.
Done @kcho
Bug due to modifying df during iteration. For some subjects with several consecutive duplicates, the df gets messy.
I found this revised code from last year that was never pushed. I believe it resolves the issue but it still needs to be tested. If I recall correctly, I left this code in a comment last year, but I cannot find the comment.
def consecutive_duplicates(df):
"""
Ignore scan order fails caused by unexpected consecutive duplicates
(consecutive duplicate in series_order that is not in series_order_target).
"""
# Drop summary row
# df.drop('Summary', inplace=True)
# Drop summary row
df.drop(0, inplace=True)
# Reset index
df = df.reset_index(drop=True)
# If df contains unexpected consecutive duplicate,
# update series_order_target column to fix offset,
# and recalculate order_diff column
if (
(df["order_diff"] == "Fail")
& (df["series_order"].shift(1) == df["series_order"])
& (df["series_order"] != df["series_order_target"])
).any():
# List to update series_order_target column
updated_series_order_target = []
for index, row in df.iterrows():
# If row contains unexpected consecutive duplicate,
# append message and series_order_target to list
if (
row["order_diff"] == "Fail"
and row["series_order"] == df.iloc[index - 1]["series_order"]
and row["series_order"] != row["series_order_target"]
):
updated_series_order_target.append(
"Unexpected consecutive duplicate"
)
updated_series_order_target.append(row["series_order_target"])
# If row does not contain unexpected consecutive duplicate,
# append series_order_target to list
else:
updated_series_order_target.append(row["series_order_target"])
# Remove trailing NaNs from list
updated_series_order_target = [
x for x in updated_series_order_target if not pd.isna(x)
]
# Update series_order_target column
df = df.assign(series_order_target=updated_series_order_target)
# Recalculate order_diff column
df["order_diff"] = ""
df.loc[
df["series_order_target"] == df["series_order"], "order_diff"
] = "Pass"
df.loc[
df["series_order_target"] == "Unexpected consecutive duplicate",
"order_diff",
] = "Warning"
df.loc[
(df["series_order_target"] != df["series_order"])
& (
df["series_order_target"] != "Unexpected consecutive duplicate"
),
"order_diff",
] = "Fail"
# Return updated df
return df
# If df does not contain unexpected consecutive duplicate, return df
else:
return df
The
Scan order
table in the QQC report checks the series order compared to the template dataset. This table is created byqqc.qqc.dicom.check_order_of_series
.You can try using the function by running the lines below in python
However, when a series gets repeated due to an issue in the initial scan, the order of series will be shifted by one and will not match the standard template anymore.
We need a function that
order_check_df
(pd.DataFrame
)Fail
intoPass
if just extra series were added toseries_order
column.series_order_target
column, and check if this mapping applies to each row in theseries_order
column?