AnugyaSahu / Combinatorial-problems-with-Transformers

Solving complex real-world COPs with limited data / information and deep learning
Apache License 2.0
0 stars 0 forks source link

Take a look at PROCAT dataset #4

Closed riridi closed 1 year ago

riridi commented 1 year ago

As it is the only real-world dataset I currently know of, please take a look at the PROCAT dataset. It is structured in real-world data and synthetic data. Maybe you can summarize the infos in a short overview including its properties, dataset size, the problem/lack of solutions, ...
https://github.com/mateuszjurewicz/procat

AnugyaSahu commented 1 year ago

• 10000 product catalogues, 1.5 million individual product offers • data collected within the 4-year period between 2015 and 2019 • Text features of offers grouped into sections • Two types – main PROCAT dataset and one synthetically generated set of simplified catalogue structures • Easy to use • variable sequence lengths and substructures • synthetic datasets also allow for predicting multiple valid catalogue structures from the same underlying input set(not the case with main dataset, only one target permutation available) • implicit clustering task with varying number of clusters • PROCAT MINI – small subset of data with same features, csv files • Three .csv files – offer_features – contains information about offer_id, section_id, priority rating, heading, offer as vector – 1613686 instances section_features – maps section ids to their corresponding catalogue id and other features – 238256 instances catalog_features – each row is a single catalogue, with all information. The main training data x – random permutation of offer vector for each catalogue , y – restoring the original order of the offers in catalogue – 11063 instances