Open cheng10 opened 7 years ago
I think that is why. Our machine does not have enough free memory for loading two warc files(almost 2GB). Python doesn't impose memory limit beyond what the OS imposes. So, I think we need better machine to solve this.
We can probably talk about this for our presentation I guess. Have you tried reducing the max features for the TfidfVectorizer function?
WEB-20161110210430225-00000-3009~umar-VirtualBox~8443.warc.gz
Traceback (most recent call last):
File "./manage.py", line 22, in
it works for the 243M file but not for the 568M file, I should figure out a way to fix it.
243M WEB-20160920180354930-00000-10658~umar-VirtualBox~8443.warc.gz 568M WEB-20161110210430225-00000-3009~umar-VirtualBox~8443.warc.gz