Closed wangcongcong123 closed 4 years ago
I went to the website (http://manikvarma.org/downloads/XC/XMLRepository.html) for downloading rcv1-2 dataset where I only find numeric form of the dataset, i.e. samples exist by the form of feature representations instead of raw texts. Just curious about how you convert it to the raw tokens as in your repository: data/rcv1_*.json -> "doc_token"
Thanks. It's not the right version of RCV1 dataset (Reuters Corpus, Volume 1) here we used, you may find a raw token version.
I went to the website (http://manikvarma.org/downloads/XC/XMLRepository.html) for downloading rcv1-2 dataset where I only find numeric form of the dataset, i.e. samples exist by the form of feature representations instead of raw texts. Just curious about how you convert it to the raw tokens as in your repository: data/rcv1_*.json -> "doc_token"
Thanks.