Hello,
As I try to understand the dataset more, I am not clear on how to get the nutrients for a specific branded food. I will post my understanding below and hope that you will be able to spot the gap correct me.
Also, I am using latest dataset from Oct 2022
Here it goes
The uniqueness of a branded food is identified with gtin_upc field.
The branded_food.csv has one or more records for a given gtin_upc field. This is based on when the product is available and when it was modified or discontinued.
To find the latest record for a branded food, I get the latest fdc_id by sorting the rows (in descending order) first by available_date, then by modified_date. Then, I pick the first row, and therefore get the fdc_id.
For this fdc_id, I search in food_nutrient.csv file. My understanding is that with this query, I will be able to fetch all the nutrients for this branded food product.
Unfortunately, I believe my understanding is not correct. For example, Let’s take the following product
gtin_upc is 810041590039
The latest fdc_id available (as per Oct 2022 dataset, the website has one more record) is 2330624.
With this fdc_id, when I query in food_nutrient table, I get no results. That means that the latest fdc_id for a given gtin_upc is no enough to retrieve the nutrients for the product.
So what shall I do to get the latest nutrients for a branded food product? One way is to get all fdc_ids for an associated gtin_upc. Then, starting from the oldest fdc_id, create a map of nutrients. If, there is a new or updated nutrient found, update the map. By the end of this exercise, there will be a Map that contains all nutrients for the product. However, how to deal with siutations when a nutrient is not added or updated, rather deleted?
Please guide me so that I can get the latest nutrients for a product uniquely identified with gtin_upc.
Hello, As I try to understand the dataset more, I am not clear on how to get the nutrients for a specific branded food. I will post my understanding below and hope that you will be able to spot the gap correct me.
Also, I am using latest dataset from Oct 2022
Here it goes
modified_date
. Then, I pick the first row, and therefore get the fdc_id.food_nutrient.csv
file. My understanding is that with this query, I will be able to fetch all the nutrients for this branded food product.Unfortunately, I believe my understanding is not correct. For example, Let’s take the following product
So what shall I do to get the latest nutrients for a branded food product? One way is to get all fdc_ids for an associated gtin_upc. Then, starting from the oldest fdc_id, create a map of nutrients. If, there is a new or updated nutrient found, update the map. By the end of this exercise, there will be a Map that contains all nutrients for the product. However, how to deal with siutations when a nutrient is not added or updated, rather deleted?
Please guide me so that I can get the latest nutrients for a product uniquely identified with gtin_upc.
Thank you