amzn / pecos

PECOS - Prediction for Enormous and Correlated Spaces
https://libpecos.org/
Apache License 2.0
514 stars 105 forks source link

Is there at least one example showing how to use Pecos from a plain text dataset? #157

Closed celsofranssa closed 2 years ago

celsofranssa commented 2 years ago

It has been difficult to infer how to use the PECOS properly. The usage case is splited over several README.md files and through the issues.

Then, could you provide a toy example of an end-to-end approach (using XR-Transformer for instance)?

Consider the following scenario: We have the training and testing samples in plain text

#train samples:
    text: raw_text_1, labels: [L1, L7, ..., L3]
    text: raw_text_2, labels: [L8, L9]
    ...
    text: raw_text_N, labels: [L1, L7, ..., L4]

#test samples:
    text: test_raw_text_1
    text: test_raw_text_2
    ...
    text: test_raw_text_M

and someone has to:

  1. prepare the data to the accepted format;
  2. train the model;
  3. predict the top k labels.
hallogameboy commented 2 years ago

Hi,

Thank you very much for the suggestion. We will keep improving documentation for better experiences.

For this moment, we have released the materials for our hands-on tutorial in the incoming KDD 2022. You can follow the steps in the part Customized PECOS Model in Session 2, which uses extreme multi-label text classification as an example from scratch. It consists of all detailed steps from data preparation to model training/inference/evaluation.

Please let us know if you have any further questions on PECOS usage. Thanks.

celsofranssa commented 2 years ago

Thank you very much. Unfortunately, the linked notebooks are more like a presentation of the Pecos' feature and not how to apply it to predict labels from pure text/labels. Something like an end-to-end pipeline would be fascinating where someone could point the pairs (text, labels) in the training stage and the texts in the prediction stage.

soummyaah commented 2 years ago

Hi,

I am facing a similar problem. I am working with a custom dataset but understanding the procedure of generating the csr matrix and other required files in the format required by PECOS or other XMC repositories is proving to be difficult. Any pointers for the same? How do I convert raw text and labels to the required formats?

runningabcd commented 1 year ago

你好,

我面对类似的问题。我正在使用自定义数据集,但理解以 PECOS 或其他 XMC 存储库需要的格式生成 csr 矩阵和其他所需要文件的过程被证明是困难的。任何相同的指针?如何将原始文本和标签转换为所需格式?

hi, pretty girl, i think you need this https://github.com/amzn/pecos/blob/b9478e61a3dd882858b35743000b3565e0847785/tutorials/kdd22/Session%205%20eXtreme%20Multi-label%20Classification%20with%20XR-Transformer.ipynb

截屏2023-06-20 17 21 41