florianschanda / miss_hit

MATLAB Independent, Small & Safe, High Integrity Tools - code formatter and more
GNU General Public License v3.0
160 stars 21 forks source link

support DSLs inside MATLAB #212

Open acristoffers opened 3 years ago

acristoffers commented 3 years ago

What kind of feature is this?

MISS_HIT component affected

Describe the solution you'd like

The CVX project (http://cvxr.com/cvx) has its own "mini-language" inside Matlab. The style checker, when set to fix the file, breaks the indentation. In the example page, the code:

m = 20; n = 10; p = 4;
A = randn(m,n); b = randn(m,1);
C = randn(p,n); d = randn(p,1); e = rand;
cvx_begin
    variable x(n)
    minimize( norm( A * x - b, 2 ) )
    subject to
        C * x == d
        norm( x, Inf ) <= e
cvx_end

becomes

m = 20; n = 10; p = 4;
A = randn(m,n); b = randn(m,1);
C = randn(p,n); d = randn(p,1); e = rand;
cvx_begin
variable x(n)
minimize( norm( A * x - b, 2 ) )
subject to
C * x == d
norm( x, Inf ) <= e
cvx_end

The solution would be to treat cvx_begin, subject to and cvx_end the same way that if, else and end are treated, with the added quirk that the subject to block is actually one indentation level more, not less/the same like else. If the particular indentation of subject to is too complicated to implement, having it the same as else is already better than the current behaviour.

florianschanda commented 3 years ago

Oh dear. OK, so I'll be really honest I do not see how I can reasonably do this.

Mainly because MISS_HIT is actually based on a full MATLAB lexer and parser. This means I do not just have a list of "if", "case", etc. and then indent. There is a full understanding of the semantics of the code. For example see how an if statement is parsed

In addition this is turned into an AST, see here again for the if example:

So to support this I would need to fully understand and implement this mini-language.

It is not impossible, but it could be done as an extra language addition (e.g. we already have Octave as a language, and Simulink to some extent, we could add CVX too...)

But the effort for this for me would not be reasonable.

That said, I will keep this open. Maybe there is a way to do this, even if it's just a hack. But I will think about it because clearly the use-case is there. Perhaps a user-defined list of functions, that when called produce extra indent and extra outdent.

florianschanda commented 3 years ago

@acristoffers again, I can't promise anything fast, but this problem intrigues me :)

I don't really have time to learn all about CVX, so I will need your help! I will need to see more examples, besides that one, especially real world ones if you have. If you could send me as much example code as you can that is indented in the way that you'd like in that format that would be really really helpful. Either

From this I can try to reverse engineer some useful patterns and features. I think I will have a dsl { ... } section in the config file, where you can give special treatment to some identifiers.

florianschanda commented 3 years ago

So far I can see these rules:

I note that there is no way to get out of the +1 indent from subject, is that really the case in CVX? Or is there something that closes the subject to thing?

acristoffers commented 3 years ago

That is really the case. It is so because it mimics how you would write the minimization on paper, where people put the s.t. (subject to) below minimize (maximize/arg min/arg max) and the set of restrictions below the cost function (the norm in the example). It is like a table, but without borders.

Also, there is the maximize special function too.

acristoffers commented 3 years ago

This is a list of all special cvx_* functions:

cvx_begin
cvx_clear
cvx_end
cvx_expert
cvx_pause
cvx_power_warning
cvx_precision
cvx_profile
cvx_quiet
cvx_save_prefs
cvx_solver
cvx_solver_settings
cvx_tic
cvx_toc
cvx_where

and this is a list of keywords inside a cvx_begin/end block:

In
binary
dual
epigraph
expression
expressions
hypograph
integer
maximise
maximize
minimise
minimize
subject
variable
variables

I'm not an expert in CVX either, but I've built an examples folder with indented code. The files are minimal, having only the CVX blocks. I've extracted the snippets from the examples folder, which are all real-world examples. From what I could see, only cvx_begin, cvx_end and subject have indentation implications, all other functions/keywords are used normally. The keywords I listed above usually have no semicollon at the end of the line, but it does not hurt if you put it either.

florianschanda commented 3 years ago

I have merged PR #214, thank you for the examples.

Again, just to set expectations, please do not expect anything soon. The code that deals with indentation is somewhat complex and adding something like that will be hard, and it may turn out to be impractical after analysis. In addition, since this will be user-configurable and there is just no scheme that I could use in the configuration mechanism it will take a fair bit of design work to come up with something that can cope with at least the cvx mini language.

acristoffers commented 3 years ago

I won't expect, don't worry. When you replied showing you have a full parser/lexer, I realized how hard it will be to implement the change. So I already hacked a small (and dirty) Python script to fix the indentation after I run mhstyle, so I get everything that miss_hit offers plus the correct indentation, so the problem is solved for me. Anyway, having it built-in could be nice, if not too hard. Thank you for really considering the issue, miss_hit is a great project.