VIDA-NYU / ache

ACHE is a web crawler for domain-specific search.
http://ache.readthedocs.io
Apache License 2.0
444 stars 135 forks source link

How to fix 'Failing to build model' error? #356

Open saba-zones opened 7 months ago

saba-zones commented 7 months ago

I am using a bunch of HTML pages saved without any extension (i.e. as plain text) in my positive and negative directories of _traindata folder. Now when I run buildModel command to train a new model based on my provided dataset, a new temporary file "_smileinput.arff' is prepared but it fails to generate features, model and final pageclassifier.yml file. I am running into this issue. Can someone help me debug this?

Screenshot 2023-12-25 at 3 52 45 PM

Moreover, I am using docker image vidanyu/ache:master

saba-zones commented 7 months ago

To add more to this, when I test the same command for provided _sample_trainingdata in /config directory, I run into this issue there as well. But after deleting some of the sample files, if I run this command again, it is able to build the model for me.

This means it works for few samples, but at same time does not for few, and I cannot identify what difference in sample files is causing this.