BigDataWUR / AgML-CY-Bench

CY-Bench (Crop Yield Benchmark) is a comprehensive dataset and benchmark to forecast crop yields at subnational level. CY-Bench standardizes selection, processing and spatio-temporal harmonization of public subnational yield statistics with relevant predictors. Contributors include agronomers, climate scientists and machine learning researchers.
https://cybench.agml.org/
Other
9 stars 3 forks source link

Review of crop statistics for Mali #174

Closed gnodnooh closed 1 month ago

gnodnooh commented 1 month ago

I'm raising the following points after quick-checking the Mali data:

krsnapaudel commented 1 month ago

Thanks @gnodnooh. @janet6868 Please address the above comments in data card or data preparation notebook. Thanks.

janet6868 commented 1 month ago

@gnodnooh

  1. I have edited the boundary shapefile to include the adm_id.
  2. The production is in tons, yield in mt/ha, planted area in hectares.
  3. The derived yield statistics are calculated from the yield original data -- can be excluded
  4. The missing values/NA mean that there are no recorded values. One possible reason - that maize production was marginal in that area and in that year hence CMDT may not have wanted to spend money collecting data for that crop. PS: for production and planted areas, we only have data from 1974-1999. So we'll have no values from 2000 - 2017.
janet6868 commented 1 month ago

@krsnapaudel I have included my comments above.

janet6868 commented 1 month ago

@gnodnooh please check again, the shapefile is okay from my side. However, I have just uploaded it again to be sure.

gnodnooh commented 1 month ago

@janet6868 Thanks. Are we looking at the same folder Mali Data (Africa)? It does not show any files for me when I signed in the Google. In the incognito mode, I can see the shapefile files that has below characteristics:

image

Not sure what the problem is here.

janet6868 commented 1 month ago

@gnodnooh image

756 represents the sectors in the one large polygon (MultiPolygon)

janet6868 commented 1 month ago

when you check the attribute table: image

janet6868 commented 1 month ago

@gnodnooh please let me know if you would like me to share it as geojson.

gnodnooh commented 1 month ago

Thanks, @janet6868. Your comments are very helpful. My point is that we may only need a single feature or polygon (row) for each adm_id for the further process of aggregating climate data. I see that each adm_id has 28 duplicated polygons (rows). It’s not a significant issue, but it would just be a preference 👌

janet6868 commented 1 month ago

Hello @gnodnooh I'm so sorry, I understand your point now. If you look at the data, we have 28 years for each adm_id (1990-2017), making 28 rows for each adm_id. Let me know if this info helps. Thanks

gnodnooh commented 1 month ago

Hi @janet6868, If it is convenient for you, It would be beneficial to have a single feature per adm_id. This approach aligns with our separate data file and ensures consistency with other shapefiles. Thanks!

janet6868 commented 1 month ago

@gnodnooh image Here is one challenge of using a single feature per adm_id. I am working with third-level administrative boundaries, where each level can contain two or more sectors. Removing duplicates could result in losing important features, as illustrated by the purple-colored features that would be dropped. To address this, I will append the sector's unique ID to the administrative ID, ensuring each of the 27 features remains unique. I will update you once this is completed.

janet6868 commented 1 month ago

image @gnodnooh it's done. Kindly check it out.

gnodnooh commented 1 month ago

Looks good to me. Thanks for your work!