Why does the XR-Transformer require exceptional RAM on Amazon-670k (and in other large datasets)?

amzn / pecos

PECOS - Prediction for Enormous and Correlated Spaces

https://libpecos.org/

Apache License 2.0

509 stars 105 forks source link

Why does the XR-Transformer require exceptional RAM on Amazon-670k (and in other large datasets)? #223

Open celsofranssa opened 1 year ago

celsofranssa commented 1 year ago

Hello,

Reading the XR-Transformer paper, I would like to know the Multi-resolution learning time complexity considering the N(number of text instances) and the number of labels (L). Is the Multi-resolution learning step that causes the right amount of RAM required to apply XR-Transformer over Amazon-670k?

jiong-zhang commented 1 year ago

XR-Transformer model consists of two parts: text encoder and XMC ranker. The XMC ranker part has space complexity linear to the number of output labels and the dimension/sparsity of the input features. Therefore, generally speaking when output label space is large and the input features (TFIDF + dense embeddings) are dense, there will be more memory cost.

To reduce the memory cost you can adjust the threshold to sparsify the XMC ranker (link) where parameters below that value will be set to 0.

celsofranssa commented 1 year ago

Thank you.

celsofranssa commented 9 months ago

XR-Transformer model consists of two parts: text encoder and XMC ranker. The XMC ranker part has space complexity linear to the number of output labels and the dimension/sparsity of the input features. Therefore, generally speaking when output label space is large and the input features (TFIDF + dense embeddings) are dense, there will be more memory cost.

To reduce the memory cost you can adjust the threshold to sparsify the XMC ranker (link) where parameters below that value will be set to 0.

Could you provide the threshold that allows training XR-Transformer on the Amazon-670k dataset in a computational environment with 128GB RAM?