Open skjerns opened 5 years ago
Hi @skjerns ! Thanks for the feedback, I'm very glad to hear that you find m2cgen
to be useful! Can you please provide a bit more details like:
Hey @skjerns . Thanks for the update. Random Forest can indeed be pretty huge sometimes. How many estimators did you end up having? What was the maximum depth of an individual estimator? I'd like to try to reproduce this on my end to better understand what can be improved here.
take for instance this code:
# -*- coding: utf-8 -*-
import os
import numpy as np
from sklearn.ensemble import RandomForestClassifier
import joblib
import m2cgen
import sklearn_porter
import subprocess
train_x = np.random.rand(10000, 8)
train_y = np.random.randint(0, 4, 10000)
rfc = RandomForestClassifier(n_estimators=10, max_depth=10)
rfc.fit(train_x, train_y)
joblib.dump(rfc, 'rfc.pkl')
# transfer code
code1 = m2cgen.export_to_c(rfc)
code1 += '\nint main(int argc, const char * argv[]) {return 0;}'
with open('rfc_m2cgen.c', 'w') as f:
f.write(code1)
porter = sklearn_porter.Porter(rfc, language='c')
code2 = porter.export(embed_data=True)
with open('rfc_porter.c', 'w') as f:
f.write(code2)
# now compile the two
# assuming you're using windows, else it will be slightly different
subprocess.call('gcc rfc_m2cgen.c -o rfc_m2cgen.exe')
subprocess.call('gcc rfc_porter.c -o rfc_porter.exe')
print('m2cgen: {} kB'.format(os.path.getsize('rfc_m2cgen.exe')//1024))
print('porter: {} kB'.format(os.path.getsize('rfc_porter.exe')//1024))
#m2cgen: 370 kB
#porter: 152 kB
The compiled files are twice the size. This also holds true when compiling for other architectures. Similarly the RAM footprint is much higher, but I have no way of measuring this easily.
Do you know if there are any optimizations possible to reduce this?
Might this be due to the excessive use of memcopy, that this blow up the code/execution?
@skjerns I'd say that the difference in binary size is explained by the fact that m2cgen
and sklearn-porter took quite different approaches to code generation.
m2cgen
encodes the entire model into the code itself. It doesn't rely on any data structures or language constructs other than if
statement. All model coefficients are encoded in place where they are needed as plain literals. This approach has its pros and cons.
By using only most simplistic language constructs we can add support of new models and languages extremely fast. Once model's AST is described, all languages automatically get support of this model without any extra effort. Similarly once some language support is implemented, this language automatically gains support of all available models. Adding a new model is as easy as converting it into a simplistic AST which represents a sequence of calculations without caring about data aspect.
Of course this benefit comes with the cost - the generated code is not very readable and usually pretty large in size. Eg. in places where we could just use a for
loop we expand all iterations instead.
sklearn-porter
does quite the opposite. It carefully describes generation of each model by using manually written templates for each supported language. Model coefficients are stored in language-specific collections, all calculations are implemented manually as well. During the code generation phase it just injects model parameters into those data collections for each language individually. This obviously leads to a smaller and much more readable code, since it's been basically written by a human. This approach however requires a tremendous effort when it comes to adding new models or languages. Cost of maintenance of this functionality is pretty high as well. I believe this is partially a reason why the list of models supported by sklearn-porter is quite limited.
So far I don't have any good ideas on how to reduce the size of the generated code while avoiding the language-specific manual effort and keeping all the benefits I described above. However I haven't given up yet and still working on this 😃
@izeigerman thank's for the extensive explanation!
I do see your point of going with a different approach and I think your approach has definitely advantages. Let me know if you have any insights :)
K will do
Perhaps you could add loops to this, since all three supported languages have loops?
I'm using
m2cgen
to convert some classifier to C. It works great and results are consistent, thanks for the library!sklearn_porter
. However,m2cgen
is the only libraries that can convert my python classifiers to C without introducing errors into the classification.Do you have any idea how the footprint of the c code could be reduced?