Review of crop statistics for Mali - Githubissues

BigDataWUR / AgML-CY-Bench

CY-Bench (Crop Yield Benchmark) is a comprehensive dataset and benchmark to forecast crop yields at subnational level. CY-Bench standardizes selection, processing and spatio-temporal harmonization of public subnational yield statistics with relevant predictors. Contributors include agronomers, climate scientists and machine learning researchers.

https://cybench.agml.org/

Other

9 stars 3 forks source link

Review of crop statistics for Mali #174

Closed gnodnooh closed 1 month ago

gnodnooh commented 1 month ago

I'm raising the following points after quick-checking the Mali data:

This crop statistics report covers the southern parts of Mali, which are the major producing regions, rather than the entire country.
It might be helpful to explain the contents of each CSV file. The “maize_aggregated_yield_stats.csv” file contains general statistics about the data, not the data itself.
While crop statistics data has "adm3_pcode" column, the shapefile does not have any relevant ID column. It only includes “CMDTsector” and “CMDYsecto2”.
For the file “maize_production_area_aggre_yield_data.csv”:
- How have production and area been measured? Can we say these values represent the total production and area within a unit, or are they the total “surveyed” amounts?
- Some rows have zero or missing values for all indicators (production, area, and yield).

krsnapaudel commented 1 month ago

Thanks @gnodnooh. @janet6868 Please address the above comments in data card or data preparation notebook. Thanks.

janet6868 commented 1 month ago

@gnodnooh

I have edited the boundary shapefile to include the adm_id.
The production is in tons, yield in mt/ha, planted area in hectares.
The derived yield statistics are calculated from the yield original data -- can be excluded
The missing values/NA mean that there are no recorded values. One possible reason - that maize production was marginal in that area and in that year hence CMDT may not have wanted to spend money collecting data for that crop. PS: for production and planted areas, we only have data from 1974-1999. So we'll have no values from 2000 - 2017.

janet6868 commented 1 month ago

@krsnapaudel I have included my comments above.

janet6868 commented 1 month ago

@gnodnooh please check again, the shapefile is okay from my side. However, I have just uploaded it again to be sure.

gnodnooh commented 1 month ago

@janet6868 Thanks. Are we looking at the same folder Mali Data (Africa)? It does not show any files for me when I signed in the Google. In the incognito mode, I can see the shapefile files that has below characteristics:

Not sure what the problem is here.

janet6868 commented 1 month ago

@gnodnooh

756 represents the sectors in the one large polygon (MultiPolygon)

janet6868 commented 1 month ago

when you check the attribute table:

janet6868 commented 1 month ago

@gnodnooh please let me know if you would like me to share it as geojson.

gnodnooh commented 1 month ago

Thanks, @janet6868. Your comments are very helpful. My point is that we may only need a single feature or polygon (row) for each adm_id for the further process of aggregating climate data. I see that each adm_id has 28 duplicated polygons (rows). It’s not a significant issue, but it would just be a preference 👌

janet6868 commented 1 month ago

Hello @gnodnooh I'm so sorry, I understand your point now. If you look at the data, we have 28 years for each adm_id (1990-2017), making 28 rows for each adm_id. Let me know if this info helps. Thanks

gnodnooh commented 1 month ago

Hi @janet6868, If it is convenient for you, It would be beneficial to have a single feature per adm_id. This approach aligns with our separate data file and ensures consistency with other shapefiles. Thanks!

janet6868 commented 1 month ago

@gnodnooh Here is one challenge of using a single feature per adm_id. I am working with third-level administrative boundaries, where each level can contain two or more sectors. Removing duplicates could result in losing important features, as illustrated by the purple-colored features that would be dropped. To address this, I will append the sector's unique ID to the administrative ID, ensuring each of the 27 features remains unique. I will update you once this is completed.

janet6868 commented 1 month ago

@gnodnooh it's done. Kindly check it out.

gnodnooh commented 1 month ago

Looks good to me. Thanks for your work!