Closed rasbt closed 8 years ago
is the manual garbage collection necessary? the hashing vectorizer should be really small anyhow, right?
otherwise lgtm
I used this previously in a different context because I had massive issues (later figured that I fit the pipeline via .fit(docs_train, docs_train)
instead of .fit(docs_train, y_train)
this was eating up memory like nothing else ...
In any case, I thought we could leave this in there just as a general thing for people who are using Jupyter notebooks with many large objects ... however, we could also remove it.
I find gc'ing the hashing vectorizer is weird as the whole point of it is that it doesn't take up a lot of memory. gc'ing the count vectorizer is fine.
Yeah, I am going to remove it now, for the hashing vectorizer that's really stupid, just wanted it for the counting one, but the vocab is just 75,000 or so
I mean you could keep it for the count vectorizer saying "lets remove this because the vocab is so large" but yeah it's not really that large ^^
feel free to merge after update
I see now what you mean, I had it in there 2 times. I think I'll just insert one before the Out-of-core learning section
import gc
del count_vec
del h_pipeline
gc.collect()
and explain that we can do this to get rid of objects that we are not going to use anymore, although these are not thaaaaat large in this case (it's more to bring the point across, since people may create many of these when experimenting with hyperparams or so)
fixes #15