Open dthiagarajan opened 1 year ago
Have you tried to construct it from an iterator to reduce RAM usage? Splitting the data into 3GB per batch can be a good starting point.
I have tried this, and this does indeed help, but I'm worried that this will slow down the time to build each tree. Is it expected that the time to build each tree slows down significantly when constructing the QDM/DM from an iterator? I've observed this locally on some much smaller examples, so I'm worried that this will cause the time per tree to balloon for my larger datasets that I need to train on.
It won't slow down the tree building, but the construction time of QDM
might take longer since you are loading data batches from an external memory (QDM needs to iterate multiple times to finish construction).
Having said that, I think it's possible to have support for save_binary
, given we don't promise the backward compatibility.
Interesting, so you wouldn't expect the iteration time to be slower if I construct a QDM using a DataIter subclass? Would you expect it to be slower if I construct a DMatrix using a DataIter? And how exactly does the construction happen? Are the data batches all iterated over once at the beginning of training? Or does it happen each iteration?
if I construct a DMatrix using a DataIter
That would be using the external memory version of XGBoost, which would indeed slow down xgboost significantly, especially pre-2.0. For details, please visit the documentation site.
Are the data batches all iterated over once at the beginning of training?
On the CPU, it's iterated over 4 times. On GPU, it's iterated over two times.
Or does it happen each iteration?
No, only at the beginning of training for QDM.
Hey I was wondering if there is any updates on this? Also I wonder how much memory QDM can save if it was constructed from a dense numpy float32 matrix? For example, if my data matrix consists of 5 million observations, 1000 features ~20GB in numpy, is there a rule of thumb to use to estimate the QDM size? Thanks!
Not yet, but should be closer now that we can export the QDM to a Scipy CSR. Still need the import part.
It depends on the number of bins and the number of features, along with the CPU/GPU difference. For fully dense data will no missing values, 256 bins and float32 input, GPU QDM will be about a quarter of the input in the up coming release, current release uses larger memory. CPU is a bit more complicated, I don't have a simple description yet, will look into it.
Thanks a lot for the reply! Really looking forward to the upcoming release that can further reduce GPU QDM size! In my experiment (xgb-2.1.1) GPU QDM currently roughly used 40-50% RAM compared to my float32 np dense matrix.
I wonder why there is a difference between CPU / GPU QDM size? My understanding was hist
transforms a numerical float32 column into int_{0,...,256}, that's why we expect 1/4 RAM usage.
From an abstract view, it's mostly caused by the need to handle sparse data as well. So we have some extra structures there. We can eliminate them in the future, but the overhead is not particularly significant so we haven't prioritized it yet.
Your calculation is correct. GPU prioritizes memory usage over computation performance from time to time. It can compress the data even further if the number of bins is smaller than 256, and happy to not have any extra structures to save memory. For CPU, the priority is reversed as it has relatively more memory but is slower.
Hi XGBoost community,
Are there any plans to add support for saving out QuantileDMatrix to file, like
DMatrix.save_binary
? Creating theQuantileDMatrix
has been a RAM bottleneck for me, and I'm hoping to potentially decrease that by loading the QDM from file.Thanks in advance!