ductri / reuters_loader

Load and convert dataset RCV1-v2 to csv file
MIT License
28 stars 2 forks source link
dataset utility

Basically, just run:

python main.py path_to_dir

where path_to_dir is the absolute path to the directory containing file rcv1.tar.xz. It would output 2 csv files at path_to_dir:

The content in column text are raw text in xml format. It can be parsed easily with xml.etree.ElementTree.XML(text)