Open mzhilyaev opened 10 years ago
The size of the file is 491505, slightly less than textModel.json and uuidMapping.json of edrules
@oyiptong the size of textModel should scale based on the number of categories? The odp.txt file here seems to have 867 categories that happen to be used in these top 5000 sites, so that should seem to result in a 100MB textModel?
There will be an increase, but I'm not sure by how much. Let's do some rough estimation. What will have the most impact on size will be the number of categories.
Given that:
867 categories should yield approx:
That makes it ~83MB without whitespace. 100MB sounds about fair
Other text models (using something else than naive bayes) will be much smaller.
How big of a file does the script generate from the 5000 odp.txt? Should we shrink that source file if it generates something too big?
Also, not sure how we should include "build" dependencies for the perl pieces:
Can't locate JSON.pm in @INC