Closed WestonAnderson closed 5 months ago
Hi @WestonAnderson Sorry for the late follow up on this issue. The current final combined file is /public/hvstat_data.csv, and I have found no duplicated rows. check this lines:
df = pd.read_csv('../public/hvstat_data.csv', index_col=0)
assert df[df.columns[:-1].values].duplicated(keep=False).sum() == 0
df.columns[:-1] only excepts for value.
I believe this has been resolved so I'm closing this out
There are duplicate values for some years in our final dataframe for four countries when we use the following command: df[df[df.columns[1:-2].values].duplicated(keep=False)]
The countries are Afghanistan, Niger, Somalia, and Zambia. I have found the issue for Zambia, which is that there are multiple names for Maize in just one or two years. I will follow up on Niger as well.
Can you find the issue with the Afghanistan and Somalia data? Presumably there should be no duplicate values in this final public dataframe
Best, Weston