BayesWitnesses / m2cgen

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies
MIT License
2.79k stars 241 forks source link

Support for SAS - happy to open PR with some guidance #517

Closed lucharo closed 2 years ago

lucharo commented 2 years ago

Hello, first of thank you so much for such an amazing package. I have built a thin wrapper around this package that turns the generated VBA code to SAS supporting XGB, Random Forest, CART, logistic and linear regression models. The whole logic is based on regex rules so it's not very robust. Unfortunately, all the code that I have written is part of an enterprise project so I cannot share at this present moment.

A much preferred solution is to use all the built-in logic that you have developed with m2cgen to turn all the supported models into SAS. I'd be happy to open a PR for a SASInterpreter and a SASCodeGenerator but I don't know where to get started from. If you give me some advice/tips I think I'll be able to make this happen!

StrikerRUS commented 2 years ago

Hey @lucharo ! Thanks for your interest in m2cgen!

I think you can start from looking at PRs that added support for other programming languages, e.g. Elixir, Rust, Dart.

lucharo commented 2 years ago

Hello @StrikerRUS! I have been working on this for the last week and I am close to submit a PR :) (I actually will be interacting from a different GH account and will be submitting a PR from my company's GitHub, need to wait on some legal stuff)

The code that has been developed is of such quality! It's all so well thought out, looking forward to submit the PR

I have one question though. Most if not all the already supported languages are open source and hence downloadable the github action workers. Your testing pipeline relies on the code being run locally (in the GH Action workers) with whatever language is being tested then parsing the result in python and comparing it to the python output. I was wondering how to go about this with a language like SAS which is not open source and also often not installed as a client tool but rather installed in some remote server?

One option I've thought is to expand tests/utils.py to have a remote_execute_command which can basically submit jobs over an ssh connection and have a some sort of configuration file for the specifics of the SAS server running the tests. Let me know what you think

StrikerRUS commented 2 years ago

I am close to submit a PR

Sounds good!

I was wondering how to go about this with a language like SAS which is not open source

Yeah, that's a big question!

to have a remote_execute_command which can basically submit jobs over an ssh connection

I don't think that's an option for this project. Such method will require maintenance burden to support that server for which we don't have resources.

Maybe there is any basic/community/trial version of SAS we can use? Or we can send request to SAS for very limited personal license that we'll be using here (from the first glance it seems SAS respects open source in some way). Here is what I've found so far: https://www.listendata.com/2021/01/run-sas-in-python-without-installation.html.

StrikerRUS commented 2 years ago

Added into feature-request hub https://github.com/BayesWitnesses/m2cgen/issues/490.