https://crates.io/crates/cosmetics_parser
https://docs.rs/cosmetics_parser/0.1.0/cosmetics_parser/
cosmetics_parser
is a Rust-based parser designed to extract information from a cosmetics catalog written in a human-readable markdown format. The input consists of product descriptions that include details such as product name, skin type, ingredients, ratings, price, user reviews, and availability.
The parser reads these product descriptions and converts them into a structured data format, which can be used for further processing, analysis, or presentation in an online cosmetics store application.
The parser reads a markdown-like format with structured information for each product. Each product contains the following fields:
The parser uses the Pest library to process the input format. The grammar rule defined in grammar.pest
handles product descriptions and processes fields such as numbers, strings, and lists (e.g., user ratings).
WHITESPACE = { " " | "\t" }
SPACE = { WHITESPACE+ }
products = { (product ~ (NEWLINE | SPACE))* }
product = { product_name ~ skin_type ~ ingredients ~ rating ~ price ~ user_ratings ~ recommendations ~ reviews ~ availability }
rating = { "*Rating*:" ~ SPACE? ~ number ~ NEWLINE }
availability = { "*Availability*:" ~ SPACE? ~ boolean ~ NEWLINE }
price = { "*Price*:" ~ SPACE? ~ number ~ SPACE? ~ currency? ~ NEWLINE }
user_ratings = { "*User Ratings*:" ~ SPACE? ~ number_list ~ NEWLINE }
product_name = { "*Product " ~ (ASCII_DIGIT+) ~ "*:" ~ any_text }
recommendations = { "*Recommendations*:" ~ any_text }
ingredients = { "*Ingredients*:" ~ any_text }
reviews = { "*Reviews*:" ~ SPACE? ~ (review)* }
skin_type = { "*Skin Type*:" ~ any_text }
number = { ("-"? ~ ASCII_DIGIT+) ~ (("." ~ ASCII_DIGIT+)?) }
number_list = { "[" ~ number ~ ("," ~ SPACE? ~ number)* ~ "]" }
review = { NEWLINE? ~ number ~ "." ~ any_text }
currency = { "UAH" | "EUR" | "USD" }
any_text = { SPACE? ~ (!NEWLINE ~ ANY)+ ~ NEWLINE }
boolean = { ("true" | "false") }
The input is processed line by line, and the parser extracts relevant data from each field. After parsing, a CosmeticsCatalog
object is created to hold the parsed products. This catalog can then be used for further processing or display in a frontend application.
*Product 1*: Face Cream "Moisturizing"
*Skin Type*: Dry Skin
*Ingredients*: Water, Glycerin, Hyaluronic Acid, Jojoba Oil
*Rating*: 4.5
*Price*: 299.99 UAH
*User Ratings*: [5, 4, 5, 3, 4]
*Recommendations*: Use in the morning and evening after cleansing the skin. Suitable for sensitive skin.
*Reviews*:
1. "This cream perfectly moisturizes my skin. It absorbs easily!"
*Availability*: true
{
"product_name": "Face Cream \"Moisturizing\"",
"skin_type": "Dry Skin",
"ingredients": "Water, Glycerin, Hyaluronic Acid, Jojoba Oil",
"rating": 4.5,
"price": 299.99,
"user_ratings": [
5.0,
4.0,
5.0,
3.0,
4.0
],
"recommendations": "Use in the morning and evening after cleansing the skin. Suitable for sensitive skin.",
"reviews": [
"1. \"This cream perfectly moisturizes my skin. It absorbs easily!\""
],
"availability": true
}