Open psftc opened 4 weeks ago
Elements Data:
number of columns: 89089, number of rows: 4, types of data in the different columns: Element id/ part number/ color id/ design id
Elements in Lego are used to refer to a specific piece/ part of Legos, for example an element id of 4566309 refers to a black boat anchor, with the data of 2564/0/2564. Initially, I was unsure of what design idea meant, but you explained it as different variations/versions of the same part. Some design ids missing, indicating some parts never received another variation/version.
Themes:
number of columns: 461, number of rows: 3, types of data in the different columns: Theme id/ name/ parent id
Themes are used to help categorize sets. The id and name are self-explanatory, but the parent id is used if there is already a broader theme category that the theme could be included in. For example, Racers and Ferrari are two separate themes but Ferrari has the parent id of Racer as Ferrari could also be listed under Racer and Racer has no parent id as there is no such existing broader theme it could be listed under.
Color:
number of columns: 268, number of rows: 4, types of data in the different columns: color id/color name/ color on RGB/ translucent
The way color is categorized in Lego is by an id. In the case of black the data would look like 0/Black/ 05131D/false.
Parts categories:
number of columns: 69, number of rows: 2, types of data in the different columns: id/name
Part categories are used to categorize parts based on their properties, for example all bricks fall under the category of 11 no matter their size or color, so the data in this case would be 11/brick.
Parts:
number of columns: 54498, number of rows: 4, types of data in the different columns: part number/ name/ part category/ part material
Similar to elements each part has its unique data. For example, a 1 by 2 brick has the data of 11211pr0001 /1 x 2 brick/11/plastic.
Parts relationships:
number of columns: 30989, number of rows: 3, types of data in the different columns: relationship type/child part/ parent part
Parts grouped together based on relationships and similarities but not close enough to be considered in the same category. For an example we can look at R/98653pr0003/98086pr0003, the parent part being the head of a pterodactyl and the child part being the body. Both parts of different category but hold similarities, however I was unable to determine how relationship type is chosen.
Sets:
number of columns: 22680, number of rows: 6, types of data in the different columns: set number/ name/ year/ theme/ number of parts/ URL of image
Minifigs:
number of columns: 14356, number of rows: 4, types of data in the different columns: fig number/ name/ number of parts/URL
Inventories:
number of columns: 38738, number of rows: 3, types of data in the different columns: id/version/ set number
Inventories are the versions of a Lego set.
Inventory minifigs:
number of columns: 21804, number of rows: 4, types of data in the different columns: inventory/set by Inventory / fig number/ quantity of figs
Inventory minifigs are used to determine the number of mini figures in the set.
Inventory parts:
number of columns: 1048576, number of rows: 6, types of data in the different columns: inventory/ part number/ color id/ quantity/ spare/ URL
The Inventory parts gives a detailed overview of each piece in the inventory. A few image URLs are missing, which I find odd as no other data sheet has missing URLs.
Inventory sets:
number of columns: 4432, number of rows: 3, types of data in the different columns: inventory/set number/ quantity
Total number of versions each set has.
Reviewing model:
Reviewing the model/diagram after analyzing the data, I realize it resembles a node graph and the data is interconnected. What I mean by this is each node has data from another node, for example the set node uses data from themes, while the inventory and inventory set nodes both use data from the set node. Additionally, we can see that parts and inventory are connected through the middleman of inventory parts, which uses both inventory and parts in its data. Some other observations are that the graph is a bit outdated as the node Element does not have a data row for design idea, unlike the data charts. Finally, I noticed that some nodes had a specific id for specific data sets (consisting of Inventories, Minifigs, Sets, Parts, Parts categories, Color, Elements, themes ) while others were blank (consisting of Inventory sets, Inventory parts, Inventory minifigs, Parts relationships) the correlation was that the one with specific ids were used as data in other nodes except for element and the ones that were blank were stand alone and not used for data.
The data set used is from the Rebrickable website here Rebrickable Downloads. This issue has several steps to complete
Go to the downloads page above and identify the data files - we are interested only in the relational data and not the images. There should be 12 files, each of which ends with the file extension of '.gz'. Note the date of the files. Download each of these files manually to your local machine. Expand each file and review the data. You should look for the following things: header, number of columns, number of rows, types of data in the different columns, which columns are missing data The download page has a model shown in a diagram. Review this given the data you've observed. When reviewing, consider mentally connecting theme, set, inventory, parts. In this ticket, add several paragraphs about your observations which may include additional questions.