dragnet-org / dragnet

Just the facts -- web page content extraction
MIT License
1.25k stars 179 forks source link

MemoryError: Unable to allocate array with shape (26577,) and data type <U1847338 #99

Open kurniarahmattt opened 4 years ago

kurniarahmattt commented 4 years ago

base_extractor = Extractor( File "/home/dragnet/dragnet/model_training.py", line 103, in train_model train_html, train_labels, train_weights = extractor.get_html_labels_weights(training_data) File "/home/dragnet/dragnet/extractor.py", line 124, in get_html_labels_weights return np.array(all_html), np.array(all_labels), np.array(all_weights) MemoryError: Unable to allocate array with shape (26577,) and data type <U1847338

i try with length data more than 20K and show memory error in numpy.array, is that problem in features engineering before fitting proccess?

kurniarahmattt commented 4 years ago

maybe its too large str in all_html[0] more than 1000K characters (depend on available memory)