Technologicat / pyan

Static call graph generator. The official Python 3 version. Development repo.
GNU General Public License v2.0
319 stars 56 forks source link

Pyan like for other langs like C/CPP/Javascript and so ... #14

Closed Uiuran closed 4 years ago

Uiuran commented 4 years ago

Just curious if you considered this, at last for matters of software architecture studies.

regards

Technologicat commented 4 years ago

Essay question, so here goes. Maybe I need a blog?

I've briefly considered writing a call graph analyzer for Racket, which is my second favorite language at the moment. So, let's consider it as an example.

At first glance, this should actually be much simpler than for Python, since Lisp syntax is so regular. But Racket being what it is, some other issues arise. First, we must track bindings at different phases. Secondly, while a fully macro-expanded Racket program consists of only 20 or so different forms, if we want to track usage of macros (not only procedures), we can't expand the code before analysis precisely because, by definition, the expander makes the macros go away.

Maybe this is not a problem, if we are interested only in the operator part of each s-expression, i.e. the thing it "calls", whether a procedure or a macro. But what if we want to see what the final expanded code calls? Final at what level of recursive expansion? The source code of the macro itself is of no help - the analyzer would need to understand what part of code is part of the expansion, and what part belongs to the logic of the macro itself. This is the kind of thing you inherently get when you have the whole language there all the time.

Then, Racket is a large language. Modules, generators, exceptions, promises, contracts, pattern matching, two different object systems, the works. There are also the official sister languages lazy/racket and typed/racket. The tagline of the whole project? Solve problems. Make languages. So yeah, it's not really a language, it's a language construction kit. (Yes, you might want one.)

Like any serious Lisp (at least CL), Racket supports customization at the reader level. A good example is sweet-exp, which adds a very pythonic indentation-based (but homoiconic and fully backward compatible) alternative syntax to the Scheme family of Lisps, including Racket. For the specs, see SRFI-105 and SRFI-110. Coming from a Python background, I like sweet-exp a lot, but the Racket community seems to like its parentheses.

(Well, there's the DrRacket IDE, and the support for Racket in Emacs (if that's your go-to IDE) seems pretty good too. If you haven't tried structured editing, it's the right (or even only) tool for handling Lisps. It can help also in other languages. Smartparens works also with Python code. That and rainbow-delimiters, and editing deeply nested parenthesized expressions just became a lot easier. With Spacemacs, Emacs is no longer the usability disaster it used to be.)

In the specific case of sweet-exp, there's the unsweeten utility that can render sweet-exp code back into s-expressions, but there is probably no general solution for arbitrary reader extensions (the whole point of a reader extension being able to change the surface syntax of the language). This may be a problem, since the whole selling point of the Lisp family is its customizability. Lisp is a fluid whereas other programming languages are solids.

So, in all, implementing a call graph analyzer for Racket would perhaps be too much work for something that's currently of minor hobby interest to me ([1], [2]).

If you're really curious, and not yet familiar with Racket, there's an intro in my spring 2018 Python for scientific computing course, see slide sets 10 and 11. And reader customization? Pfft. Real languages let you start at hardware design.

So, back to the main question: Pyan, Python, other languages? The above detour is actually semi-relevant for Python, too - in unpythonic, I pretty much went to town with MacroPy3. And Pydialect adds the plumbing to make Python into a language construction kit.

From my experience, at least macros are a useful technology. But I haven't yet considered the implications of syntactic macros for Pyan.

While macros are not composable (and hence there is an envelope of reachable extensions before the collection of macros becomes too complex for a human to extend), I think they fill an interesting niche: macros can be used for borrowing features from other programming languages. A simple example is Python's "with" in Clojure. True wizards build things like algebraic data types or polymorphic monads.

Another feature Pyan is missing is Cython support. During the last few years I've mostly coded numerics (such as [1], [2]), where a wide-spectrum approach is important, and currently I have no call graph analyzer for Cython code.

Since Cython doesn't change the semantics of calls... much... a good first approximation would be a decythonize function, reading the Cython module and stripping everything that's not standard Python. This would be so we could then feed the code to ast.parse, and then just let Pyan do what it already knows how. This requires a parser, with a grammar for Cython, which is a superset of Python. Probably that could be built in pyparsing, but whether the Cython analysis feature is worth the unavoidable maintenance effort to keep it in sync with the official grammar, I don't know.

As for other mainstream languages, such as C, C++, Java, Fortran - I'm not coding in them anymore (at least not by hand), so they're not that interesting to me. C is probably still the right choice in the systems programming niche, and Fortran in low-level high-performance numerics. For general-purpose programming however, there are better technologies.

In my opinion this includes Python, even though it's already ~30 years old, there's the GIL, and while hacking on Pyan or on unpythonic.syntax, nary a day goes by without cursing at the AST representation. (Q. How many pythons does it take to represent constants right? A. Only one, but it must be the latest version.)

But Python is highly approachable, quick to read and write, quite powerful, and has a huge library ecosystem. And it gets the indentation vs. parentheses issue at least almost right. Maybe for really advanced techniques Racket or Haskell can be better platforms. For anything else, there's Python.

Haskell might be an interesting target, but it's a world of its own, and I'm not that familiar with it.

Julia, a relatively new (~10 years) contender for a high-level language for numerics, is a possible future target, but I'm not making any promises. :)

Uiuran commented 4 years ago

a nice piece of the stone ur essay True wizards build things like algebraic data types https://lexi-lambda.github.io/blog/2015/12/21/adts-in-typed-racket-with-macros/ or polymorphic monad https://github.com/tonyg/racket-monad agree althoug just barely scratch it

solve problem make langs, think thats what i was thinking with tf comp graphs, even if it touch numbers and the objectivity, convexity, what is not really th great whereabouts of the matter

Em qui, 25 de jul de 2019 19:25, Juha Jeronen notifications@github.com escreveu:

Essay question, so here goes. Maybe I need a blog?

I've briefly considered writing a call graph analyzer for Racket https://racket-lang.org/, which is my second favorite language at the moment. So, let's consider it as an example.

At first glance, this should actually be much simpler than for Python, since Lisp syntax is so regular. But Racket being what it is, some other issues arise. First, we must track bindings at different phases https://docs.racket-lang.org/guide/phases.html. Secondly, while a fully macro-expanded Racket program consists of only 20 or so different forms https://docs.racket-lang.org/reference/syntax-model.html?q=expanded#(part._fully-expanded), if we want to track usage of macros (not only procedures), we can't expand the code before analysis precisely because, by definition, the expander makes the macros go away.

Maybe this is not a problem, if we are interested only in the operator part of each s-expression, i.e. the thing it "calls", whether a procedure or a macro. But what if we want to see what the final expanded code calls? Final at what level of recursive expansion? The source code of the macro itself is of no help - the analyzer would need to understand what part of code is part of the expansion, and what part belongs to the logic of the macro itself. This is the kind of thing you inherently get when you have the whole language there all the time http://paulgraham.com/icad.html.

Then, Racket is a large language. Modules, generators, exceptions, promises, contracts, pattern matching, two different object systems, the works. There are also the official sister languages lazy/racket https://docs.racket-lang.org/lazy/index.html and typed/racket https://docs.racket-lang.org/ts-guide/. The tagline of the whole project? Solve problems. Make languages. So yeah, it's not really a language, it's a language construction kit. (Yes, you might want one https://beautifulracket.com/appendix/why-racket-why-lisp.html.)

Like any serious Lisp (at least CL http://www.lispworks.com/documentation/lw51/CLHS/Body/f_set_ma.htm#set-macro-character), Racket supports customization at the reader level https://docs.racket-lang.org/guide/hash-reader.html. A good example is sweet-exp https://docs.racket-lang.org/sweet/, which adds a very pythonic indentation-based (but homoiconic https://en.wikipedia.org/wiki/Homoiconicity and fully backward compatible) alternative syntax to the Scheme family of Lisps, including Racket. For the specs, see SRFI-105 https://srfi.schemers.org/srfi-105/srfi-105.html and SRFI-110 https://srfi.schemers.org/srfi-110/srfi-110.html. Coming from a Python background, I like sweet-exp a lot, but the Racket community seems to like its parentheses.

(Well, there's the DrRacket IDE http://docs.racket-lang.org/drracket/index.html, and the support for Racket in Emacs (if that's your go-to IDE) seems pretty good too. If you haven't tried structured editing, it's the right (or even only) tool for handling Lisps. It can help also in other languages. Smartparens https://github.com/Fuco1/smartparens works also with Python code. That and rainbow-delimiters https://github.com/Fanael/rainbow-delimiters, and editing deeply nested parenthesized expressions just became a lot easier. With Spacemacs http://spacemacs.org/, Emacs is no longer the usability disaster it used to be.)

In the specific case of sweet-exp, there's the unsweeten utility that can render sweet-exp code back into s-expressions, but there is probably no general solution for arbitrary reader extensions (the whole point of a reader extension being able to change the surface syntax of the language). This may be a problem, since the whole selling point of the Lisp family is its customizability. Lisp is a fluid whereas other programming languages are solids.

So, in all, implementing a call graph analyzer for Racket would perhaps be too much work for something that's currently of minor hobby interest to me ( [1] https://github.com/Technologicat/spicy, [2] https://github.com/Technologicat/python-3-scicomp-intro/tree/master/examples/beyond_python ).

If you're really curious, and not yet familiar with Racket, there's an intro in my spring 2018 Python for scientific computing course https://github.com/Technologicat/python-3-scicomp-intro/tree/master/lecture_slides, see slide sets 10 and 11. And reader customization? Pfft. Real languages let you start at hardware design https://yosefk.com/blog/my-history-with-forth-stack-machines.html.

So, back to the main question: Pyan, Python, other languages? The above detour is actually semi-relevant for Python, too - in unpythonic https://github.com/Technologicat/unpythonic, I pretty much went to town with MacroPy3 https://github.com/azazel75/macropy. And Pydialect https://github.com/Technologicat/pydialect adds the plumbing to make Python into a language construction kit.

From my experience, at least macros are a useful technology. But I haven't yet considered the implications of syntactic macros for Pyan.

While macros are not composable (and hence there is an envelope of reachable extensions http://fexpr.blogspot.com/2013/12/abstractive-power.html before the collection of macros becomes too complex for a human to extend), I think they fill an interesting niche: macros can be used for borrowing features from other programming languages. A simple example is Python's "with" in Clojure http://eigenhombre.com/macro-writing-macros.html. True wizards build things like algebraic data types https://lexi-lambda.github.io/blog/2015/12/21/adts-in-typed-racket-with-macros/ or polymorphic monads https://github.com/tonyg/racket-monad.

Another feature Pyan is missing is Cython support. During the last few years I've mostly coded numerics (such as [1] https://github.com/Technologicat/pydgq, [2] https://github.com/Technologicat/python-wlsqm), where a wide-spectrum https://en.wikipedia.org/wiki/Wide-spectrum_language approach is important, and currently I have no call graph analyzer for Cython code.

Since Cython doesn't change the semantics of calls... much... a good first approximation would be a decythonize function, reading the Cython module and stripping everything that's not standard Python. This would be so we could then feed the code to ast.parse, and then just let Pyan do what it already knows how. This requires a parser, with a grammar for Cython, which is a superset of Python. Probably that could be built in pyparsing https://github.com/pyparsing/pyparsing, but whether the Cython analysis feature is worth the unavoidable maintenance effort to keep it in sync with the official grammar, I don't know.

As for other mainstream languages, such as C, C++, Java, Fortran - I'm not coding in them anymore (at least not by hand https://github.com/TUTElectromechanics/mm-codegen), so they're not that interesting to me. C is probably still the right choice in the systems programming niche, and Fortran in low-level high-performance numerics. For general-purpose programming however, there are better technologies.

In my opinion this includes Python, even though it's already ~30 years old, there's the GIL http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html, and while hacking on Pyan or on unpythonic.syntax, nary a day goes by without cursing at the AST representation https://greentreesnakes.readthedocs.io/en/latest/nodes.html. (Q. How many pythons does it take to represent constants right? A. Only one, but it must be the latest version https://bugs.python.org/issue32892.)

But Python is highly approachable, quick to read and write, quite powerful, and has a huge library ecosystem. And it gets the indentation vs. parentheses issue at least almost right. Maybe for really advanced techniques Racket or Haskell can be better platforms. For anything else, there's Python.

Haskell might be an interesting target, but it's a world of its own, and I'm not that familiar with it.

Julia https://julialang.org/, a relatively new (~10 years) contender for a high-level language for numerics, is a possible future target, but I'm not making any promises. :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Technologicat/pyan/issues/14?email_source=notifications&email_token=AAOOCX4ITH6VZMEB7GTWURLQBIR6XA5CNFSM4IG4Y5GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD226KBQ#issuecomment-515237126, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOCX76K4Z2AK3Q2ZFWCVTQBIR6XANCNFSM4IG4Y5GA .

Uiuran commented 4 years ago

sorry i barely, there are some odds here lolᕦ( ⊡ 益 ⊡ )ᕤ not really big problem to solve thoug

Em qui, 25 de jul de 2019 21:47, Daniel Penalva dkajah@gmail.com escreveu:

a nice piece of the stone ur essay True wizards build things like algebraic data types https://lexi-lambda.github.io/blog/2015/12/21/adts-in-typed-racket-with-macros/ or polymorphic monad https://github.com/tonyg/racket-monad agree althoug just barely scratch it

solve problem make langs, think thats what i was thinking with tf comp graphs, even if it touch numbers and the objectivity, convexity, what is not really th great whereabouts of the matter

Em qui, 25 de jul de 2019 19:25, Juha Jeronen notifications@github.com escreveu:

Essay question, so here goes. Maybe I need a blog?

I've briefly considered writing a call graph analyzer for Racket https://racket-lang.org/, which is my second favorite language at the moment. So, let's consider it as an example.

At first glance, this should actually be much simpler than for Python, since Lisp syntax is so regular. But Racket being what it is, some other issues arise. First, we must track bindings at different phases https://docs.racket-lang.org/guide/phases.html. Secondly, while a fully macro-expanded Racket program consists of only 20 or so different forms https://docs.racket-lang.org/reference/syntax-model.html?q=expanded#(part._fully-expanded), if we want to track usage of macros (not only procedures), we can't expand the code before analysis precisely because, by definition, the expander makes the macros go away.

Maybe this is not a problem, if we are interested only in the operator part of each s-expression, i.e. the thing it "calls", whether a procedure or a macro. But what if we want to see what the final expanded code calls? Final at what level of recursive expansion? The source code of the macro itself is of no help - the analyzer would need to understand what part of code is part of the expansion, and what part belongs to the logic of the macro itself. This is the kind of thing you inherently get when you have the whole language there all the time http://paulgraham.com/icad.html.

Then, Racket is a large language. Modules, generators, exceptions, promises, contracts, pattern matching, two different object systems, the works. There are also the official sister languages lazy/racket https://docs.racket-lang.org/lazy/index.html and typed/racket https://docs.racket-lang.org/ts-guide/. The tagline of the whole project? Solve problems. Make languages. So yeah, it's not really a language, it's a language construction kit. (Yes, you might want one https://beautifulracket.com/appendix/why-racket-why-lisp.html.)

Like any serious Lisp (at least CL http://www.lispworks.com/documentation/lw51/CLHS/Body/f_set_ma.htm#set-macro-character), Racket supports customization at the reader level https://docs.racket-lang.org/guide/hash-reader.html. A good example is sweet-exp https://docs.racket-lang.org/sweet/, which adds a very pythonic indentation-based (but homoiconic https://en.wikipedia.org/wiki/Homoiconicity and fully backward compatible) alternative syntax to the Scheme family of Lisps, including Racket. For the specs, see SRFI-105 https://srfi.schemers.org/srfi-105/srfi-105.html and SRFI-110 https://srfi.schemers.org/srfi-110/srfi-110.html. Coming from a Python background, I like sweet-exp a lot, but the Racket community seems to like its parentheses.

(Well, there's the DrRacket IDE http://docs.racket-lang.org/drracket/index.html, and the support for Racket in Emacs (if that's your go-to IDE) seems pretty good too. If you haven't tried structured editing, it's the right (or even only) tool for handling Lisps. It can help also in other languages. Smartparens https://github.com/Fuco1/smartparens works also with Python code. That and rainbow-delimiters https://github.com/Fanael/rainbow-delimiters, and editing deeply nested parenthesized expressions just became a lot easier. With Spacemacs http://spacemacs.org/, Emacs is no longer the usability disaster it used to be.)

In the specific case of sweet-exp, there's the unsweeten utility that can render sweet-exp code back into s-expressions, but there is probably no general solution for arbitrary reader extensions (the whole point of a reader extension being able to change the surface syntax of the language). This may be a problem, since the whole selling point of the Lisp family is its customizability. Lisp is a fluid whereas other programming languages are solids.

So, in all, implementing a call graph analyzer for Racket would perhaps be too much work for something that's currently of minor hobby interest to me ([1] https://github.com/Technologicat/spicy, [2] https://github.com/Technologicat/python-3-scicomp-intro/tree/master/examples/beyond_python ).

If you're really curious, and not yet familiar with Racket, there's an intro in my spring 2018 Python for scientific computing course https://github.com/Technologicat/python-3-scicomp-intro/tree/master/lecture_slides, see slide sets 10 and 11. And reader customization? Pfft. Real languages let you start at hardware design https://yosefk.com/blog/my-history-with-forth-stack-machines.html.

So, back to the main question: Pyan, Python, other languages? The above detour is actually semi-relevant for Python, too - in unpythonic https://github.com/Technologicat/unpythonic, I pretty much went to town with MacroPy3 https://github.com/azazel75/macropy. And Pydialect https://github.com/Technologicat/pydialect adds the plumbing to make Python into a language construction kit.

From my experience, at least macros are a useful technology. But I haven't yet considered the implications of syntactic macros for Pyan.

While macros are not composable (and hence there is an envelope of reachable extensions http://fexpr.blogspot.com/2013/12/abstractive-power.html before the collection of macros becomes too complex for a human to extend), I think they fill an interesting niche: macros can be used for borrowing features from other programming languages. A simple example is Python's "with" in Clojure http://eigenhombre.com/macro-writing-macros.html. True wizards build things like algebraic data types https://lexi-lambda.github.io/blog/2015/12/21/adts-in-typed-racket-with-macros/ or polymorphic monads https://github.com/tonyg/racket-monad.

Another feature Pyan is missing is Cython support. During the last few years I've mostly coded numerics (such as [1] https://github.com/Technologicat/pydgq, [2] https://github.com/Technologicat/python-wlsqm), where a wide-spectrum https://en.wikipedia.org/wiki/Wide-spectrum_language approach is important, and currently I have no call graph analyzer for Cython code.

Since Cython doesn't change the semantics of calls... much... a good first approximation would be a decythonize function, reading the Cython module and stripping everything that's not standard Python. This would be so we could then feed the code to ast.parse, and then just let Pyan do what it already knows how. This requires a parser, with a grammar for Cython, which is a superset of Python. Probably that could be built in pyparsing https://github.com/pyparsing/pyparsing, but whether the Cython analysis feature is worth the unavoidable maintenance effort to keep it in sync with the official grammar, I don't know.

As for other mainstream languages, such as C, C++, Java, Fortran - I'm not coding in them anymore (at least not by hand https://github.com/TUTElectromechanics/mm-codegen), so they're not that interesting to me. C is probably still the right choice in the systems programming niche, and Fortran in low-level high-performance numerics. For general-purpose programming however, there are better technologies.

In my opinion this includes Python, even though it's already ~30 years old, there's the GIL http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html, and while hacking on Pyan or on unpythonic.syntax, nary a day goes by without cursing at the AST representation https://greentreesnakes.readthedocs.io/en/latest/nodes.html. (Q. How many pythons does it take to represent constants right? A. Only one, but it must be the latest version https://bugs.python.org/issue32892.)

But Python is highly approachable, quick to read and write, quite powerful, and has a huge library ecosystem. And it gets the indentation vs. parentheses issue at least almost right. Maybe for really advanced techniques Racket or Haskell can be better platforms. For anything else, there's Python.

Haskell might be an interesting target, but it's a world of its own, and I'm not that familiar with it.

Julia https://julialang.org/, a relatively new (~10 years) contender for a high-level language for numerics, is a possible future target, but I'm not making any promises. :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Technologicat/pyan/issues/14?email_source=notifications&email_token=AAOOCX4ITH6VZMEB7GTWURLQBIR6XA5CNFSM4IG4Y5GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD226KBQ#issuecomment-515237126, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOCX76K4Z2AK3Q2ZFWCVTQBIR6XANCNFSM4IG4Y5GA .

Technologicat commented 4 years ago

I suppose the short answer to the original question is "sorry, no plans to support other languages at the moment" :)

In the long term, perhaps Cython, but we'll see.

Uiuran commented 4 years ago

I suppose if algeibric data types can be numerically mapped with https://en.wikipedia.org/wiki/P-adic_number#Metric_completions_and_algebraic_closures Also.

Uiuran commented 4 years ago

I would like to give my sign of good will to you keep this work. Maybe you already acknowledge this site, but good luck (maybe i will got skills to contribute for future pyan like langs and libs):

http://rosettacode.org/wiki/A*_search_algorithm