Open ashgillman opened 1 day ago
Describe the bug CSVDataset accepts pandas DataFrames as input for src. But it makes assumptions about the index.
This is because convert_tables_to_dicts uses .loc instead of .iloc. It generates ordinal indexes to subset on but treats them as names indices.
convert_tables_to_dicts
.loc
.iloc
https://github.com/Project-MONAI/MONAI/blob/0bb20a88ec7869f6453aa58890df50ad6b2b6271/monai/data/utils.py#L1494
To Reproduce
import numpy import pandas import monai df = pandas.DataFrame(numpy.random.random((50, 3))) df_subset = df.iloc[numpy.arange(0, 50, 5)] print(df_subset.shape) # (10, 3) ds = monai.data.CSVDataset(df_subset) print(len(ds)) # 3
Expected behavior print(len(ds)) should return 10. It returns 3 because it looks up indices slice(10), which match indices 0, 5 and 10 from the subset.
print(len(ds))
Environment Shouldn't be relevant?
Additional context Simple fix: https://github.com/Project-MONAI/MONAI/blob/0bb20a88ec7869f6453aa58890df50ad6b2b6271/monai/data/utils.py#L1494
The first .loc should be .iloc, and the second should be .iloc[rows][col_names]
Workaround is to always ".reset_index()" on src DataFrames.
src
Describe the bug CSVDataset accepts pandas DataFrames as input for src. But it makes assumptions about the index.
This is because
convert_tables_to_dicts
uses.loc
instead of.iloc
. It generates ordinal indexes to subset on but treats them as names indices.https://github.com/Project-MONAI/MONAI/blob/0bb20a88ec7869f6453aa58890df50ad6b2b6271/monai/data/utils.py#L1494
To Reproduce
Expected behavior
print(len(ds))
should return 10. It returns 3 because it looks up indices slice(10), which match indices 0, 5 and 10 from the subset.Environment Shouldn't be relevant?
Additional context Simple fix: https://github.com/Project-MONAI/MONAI/blob/0bb20a88ec7869f6453aa58890df50ad6b2b6271/monai/data/utils.py#L1494
The first .loc should be .iloc, and the second should be .iloc[rows][col_names]