Open celsofranssa opened 1 year ago
XR-Transformer model consists of two parts: text encoder and XMC ranker. The XMC ranker part has space complexity linear to the number of output labels and the dimension/sparsity of the input features. Therefore, generally speaking when output label space is large and the input features (TFIDF + dense embeddings) are dense, there will be more memory cost.
To reduce the memory cost you can adjust the threshold to sparsify the XMC ranker (link) where parameters below that value will be set to 0.
Thank you.
XR-Transformer model consists of two parts: text encoder and XMC ranker. The XMC ranker part has space complexity linear to the number of output labels and the dimension/sparsity of the input features. Therefore, generally speaking when output label space is large and the input features (TFIDF + dense embeddings) are dense, there will be more memory cost.
To reduce the memory cost you can adjust the threshold to sparsify the XMC ranker (link) where parameters below that value will be set to 0.
Could you provide the threshold that allows training XR-Transformer on the Amazon-670k dataset in a computational environment with 128GB RAM?
Hello,
Reading the XR-Transformer paper, I would like to know the Multi-resolution learning time complexity considering the N(number of text instances) and the number of labels (L). Is the Multi-resolution learning step that causes the right amount of RAM required to apply XR-Transformer over Amazon-670k?