Conceptual - Thinking about modules

RCL-Carl commented 7 years ago

Hi,

The following is intended to start a conversation about a thought that I had last night. I'm not necessarily advocating that resources be committed to it but I do think it deserves some discussion.

As I understand it, modules are currently compiled into a file __javascript__/<name>.mod.js which contains the implementation of the js module. If we take the logging module for example, it looks something like this:

    __nest__ (
        __all__,
        'logging', {
            __all__: {
                __inited__: false,
                __init__: function (__all__) {
                    var time = {};
                    var warnings = {};
                    __nest__ (time, '', __init__ (__world__.time));
                    __nest__ (warnings, '', __init__ (__world__.warnings));

                    //... More Implementation Details going forward...

                    __pragma__ ('<use>' +
                        'time' +
                        'warnings' +
                    '</use>')
                    __pragma__ ('<all>')
                        __all__.BASIC_FORMAT = BASIC_FORMAT;
                        __all__.BufferingFormatter = BufferingFormatter;

                        // ... More defs here ...

                        __all__.warn = warn;
                        __all__.warning = warning;
                    __pragma__ ('</all>')
                }
            }
        }
    );

This module then gets included into the main file for the application by copying it into the file, resulting in one large file. There are simplicity benefits to this approach but there are also some limitations to this approach:

It is very difficult to cache the modules in this approach. This means that every time the page is loaded, even if there was just a tiny tweak to the main module, the entire file is reloaded, even the components of the file that have not changed.
We have to select whether to load ecma5 or ecma6 at compile time - not at runtime when we actually know what the client is capable of.
This is a custom module format that is different from more standard module formats like CommonJS, AMD, etc. which means we have to build custom tools to support it.
All modules are statically compiled, so there is no opportunity really for scheduling optimization of when a module is loaded.

I'm wondering if a better approach might be to use something like requirejs to implement the module format. There are some really nice features that could be added:

RequireJS can cache modules so it may be possible to have parts of the core and certain library modules be cached on the client side which would mean that subsequent load bandwidth is reduced.
RequireJS modules could be stored on a CDN which could help with deployment of user projects in the future - for example, we could host the latest parts of the core and modules on google CDN or something, and new users could just use them instead of having to host all the files themselves.
RequireJS has lazy loading - which means that you could have the import function load modules when needed instead of compiling them all into the file. Perhaps we would make a dynamic_import or something to differentiate - I don't know - just thinking out loud. But this could be really interesting for single-page applications where you don't necessarily need to load every single page in the tree at the beginning, but can begin loading parts of the tree as the user traverses the site map.
This off-loads a lot of code that this project would otherwise have to maintain that is not necessarily in the core mission of the transcrypt project.
We can load other javascript modules in the same format using import just like you would use 3rd-party modules in python - this could make deploying modules on pip or other package manager sites a lot easier.

I'm not saying that this would necessarily be trivial. I haven't thought through everything yet and there may be some critical details that I'm missing.

Thoughts?

axgkl commented 7 years ago

Hi, a few people had been asking for this already, especially the hot module reloading seems to be sth js SPA devs are used to, using webpack based pipelines. So yes, I think many many people would like to see requirejs compatible modules... But, yes, I think not trivial, there is a lot of namespace indirection involved as far as I understand how that works...

Curiosity: I personally like a js free server - do you think building requirejs compatible modules for hot module reloading could be done with python only on the server side without huge effort?

RCL-Carl commented 7 years ago

Curiosity: I personally like a js free server - do you think building requirejs compatible modules for hot module reloading could be done with python only on the server side without huge effort?

I'm not sure I see where any server side code would be needed. The way I see it, the modules would be compiled in the transcrypt into AMDs and then could be hosted on any CDN. There is no server side code in this model. The client determines what it needs to load, checks the cache to see if it already has the latest, if not, it pulls the appropriate version of the AMD module from the CDN. There may be a security issue here if someone inserts a malicious link to an AMD module you don't own so I think this would assume HTTPS at a minimum.

axgkl commented 7 years ago

Ah I see. I thought those (huge) webpack build pipelines where compiling the js modules into sth different, I mean with one namespace per module so that it can be later reloaded without breaking dependent stuff. But you must be right, since require.js works via CDNs as well it must be possible to do that also/only in the browser. Sorry for my blatant knowlege gap.

=> Abslutely deserves a "want to have" then, imho.

axgkl commented 7 years ago

wow, I have the permission to add a prio. @JdeH : is this ok?

JdeH commented 7 years ago

Wow, a difficult discussion, this early in the morning (for me). But it's a subject that had to come on the table at some time.

@AXGKl

About assigning prio's: Since we can only assign one prio to an issue, we can't use them as a voting system. In general I want to use the prio's in the following way:

They should balance costs and yields
The opinion of a subject owner on a prio concerning his subject weighs heavily
Where possible we should more or less agree on them
I want to have a final say in which prio something gets

So currently please propose, rather than assign prio's. In general I'd like to do the labeling myself, unless something e.g. clearly is a bug.

On modules as brought up by @RCL-Carl

This touches upon many issues, like e.g. not generating code in the install dir, how to generate and distribute sourcemaps and the source code they need, how to have multiple Transcrypt snippets on one page and also how to use the compiler from CPython as a module. I've added the subject label SUB_infrastructure for this, and, frankly it's a hornets nest. These are things that have to be tackled in combination, which is quite hard.

It may not be apparent at first sight, but it is already possible to dynamically load modules. The turtle graphics showcase on the site does so, just by eval'ing it. But this is an evil hack (search for !!!)

__pragma__ ('alias', 'jq', '$')

# For use by eval'ed turtle applet
import turtle
import random
import math

def clear ():
    editor.setValue ('')
    turtle.reset ()
    run ()

'''
!!!
This effectively is dynamic module loading using AJAX.
Run program that user made on his client.
Source is uploaded to the server, and compiled.
Resulting JS module is downloaded to client and run in the client's context (including turtle, random and math) by eval (result)
'''
def run ():
    def success (result):
        turtle.reset ()
        eval (result) 

    def fail (a, b, c):
        print ('Run error:', a, b, c)

    # N.B. The request has to be explicitly encoded, but the response is already implicitly decoded
    jq.ajax ({
        'url':'http://www.transcrypt.org/compile',
        'type': 'POST',
        'data': JSON.stringify (editor.getValue ()),
        'dataType': 'json',
        'contentType': 'application/json',
        'success': success,
        'fail': fail
    })

def mail ():
    def success (result):
        print (result)

    def fail (a, b, c):
        print ('Run error:', a, b, c)

    jq.ajax ({
        'url':'http://www.transcrypt.org/mail',
        'type': 'POST',
        'data': JSON.stringify ([document.getElementById ('mail_address') .value, editor.getValue ()]),
        'dataType': 'json',
        'contentType': 'application/json',
        'success': success,
        'fail': fail
    })

def selectExample ():
    def success (result):
        editor.setValue (result [0])
        turtle.reset ()     # Using old paths
        window.terminate = True
        eval (result [1])   # Using new paths (so cannot clear old result)

    def fail (a, b, c):
        print ('Select example error:', a, b, c)

    selector = document.getElementById ('select_example')

    jq.ajax ({
        'url':'http://www.transcrypt.org/example',
        'type': 'POST',
        'data': JSON.stringify (selector.options [selector.selectedIndex] .value),
        'dataType': 'json',
        'contentType': 'application/json',
        'success': success,
        'fail': fail
    })

selectExample ()

While there's absolutely no objection to using the above hackish mechanism to split up large programs, still I'd like to postpone official, well designed, future proof support for dynamic loading and specialized module formats for considerable time, having the following piece of history in mind:

When I started my programming career I was using a Pascal compiler called Pascal MT+. It costed $2000+. It was a multipass compiler and linker and could generate standardized debugging info. Whenever I'd started compilation, I'd go for a walk in the park for half an hour (no joke) only to come back and find out it had stalled on a missing semicolon in the 3rd line of code.

At that time I got a copy of something called TurboPascal. It costed $50,-. It compiled in seconds, producing 5 times less code than MT+, and lightning fast. It had no linker, no standardized module format with debugging info, it just generated one monolithic executable, which was lightning fast.

The hype that this caused is hardly conceivable. There were no perceived alternatives anymore. Everyone just used TurboPascal. Now the world isn't that simple anymore. There are always good alternatives and Transcrypt is just one of the players. Still I'd like to focus upon it's unique selling point: simplicity and generating compact, fast code.

The story of TurboPascal doesn't end here. Later compiler versions did generate object files per compilation unit and still it was fast. But first they concentrated upon the essentials. I'd like to do the same with Transcrypt. Having dynamic loading early on adds complexity that I consider to constitute a risk for the progress of other essential parts of the project, most notably benefiting as early as possible from the ever increasing convergence between JS and Python. Above all things the generated code should be as lean and fast as possible.

This is something to guard like a fox terrier... I've seen that recently the minified downloads have grown from 19k to 23k. Anything upto 50k is no problem, but still I really want to keep a brake upon this as much as reasonable. There are compilers that generate code with dynamic module loading facilities. But they generate orders of magnitude more code. And due to the inpredictability of that process from a user point of view, I've seen that labeled actually as a drawback (http://stackoverflow.com/questions/30155551/python-in-browser-how-to-choose-between-brython-pypy-js-skulpt-and-transcrypt#answer-38564424). I do think that dynamic loading has advantages, e.g. the caching is something to be taken seriously. Repeated uploads during development are less of an item, in my opinion. Sourcemaps, on the other hand, are essential to efficient development.

One design decision taken early on in the development of Transcypt is to go for 'joined minification' rather than 'per module minification'. The difference is that with 'joined minification', identifiers that are part of library API's can also be minified. Closure isn't yet able to fully benefit from that, if I set the minification level to high, it generates semantically incorrect code. But my bet is that once it will be. I don't know for sure, of course. Joint minification is not possible when separately distributing minified target modules.

Transcrypt currently uses cascaded sourcemaps, enabling a dev to debug minified target code from the Python source. Hot loading or even separate distibution of target code will greatly complicate this. The first problem encountered is that sourcemaps need sourcecode. But also combining sourcemaps of separately loaded modules is complicated.

Transcrypt modules can already be distributed via PyPi, which is the dominant way to distribute in the Python world (as is done for NumScrypt) As an alternative they can be part of the Transcrytp distro, as done e.g. for re, time and itertools. In that case they supersede the corresponding CPython libs.

Let me not make this monologue to long. It is my estimation Transcrypt will in the end have separately loadable modules. But I think it is too early now to implement them, since they will complicate transcrypt and slow down other parts of development. I suggest we postpone this with at least a year. The 'writes in the install directory' problem will have to be solved a bit earlier maybe.

axgkl commented 7 years ago

@JdeH : Regarding Prios: Yes, I thought that it can't be wanted that I could assign the prio, will result in a total mess by definition due to the simple fact that prios change (literally) all the time alone for one prio owner. Left it set to demonstrate that I could do it for you to know that its possible.

For issues clearly touching CI only, I think I can keep the prios in my head, if not I'll ask you to create a CI prio namespace then.

JdeH commented 7 years ago

For CI the prio's should be your prio's. So propose a prio for any CI items and I almost blindly follow them.

RCL-Carl commented 7 years ago

I think these are all reasonable responses. Again, I wasn't trying to suggest that we attack this immediately. Merely musing that there may be a more general way to approach modules that might make them more flexible.

JdeH commented 7 years ago

You were clear about that, and it's good to start and maintain a thinking process about these things. For me it always takes time to integrate it all in my mind. I find it very hard not to overlook things. The way to go with infratructure, I guess, is to have a clear final goal and, in to some detail, a route in mind, but dividing it in small steps that are in itself rewarding. So indeed lets take our time on this one.

RCL-Carl commented 7 years ago

This is something to guard like a fox terrier... I've seen that recently the minified downloads have grown from 19k to 23k. Anything upto 50k is no problem, but still I really want to keep a brake upon this as much as reasonable.

I agree that this is a major concern. From my research in other similar types of projects, many of them require very long load times. In my opinion, this is a deal breaker. I think there are many solutions to this. One is to carefully select what code is generated and included in the core/libraries as you have described. In reality, this will always be the case. There are other tools though that can amplify this diligence.

One design decision taken early on in the development of Transcypt is to go for 'joined minification' rather than 'per module minification'.

I'm curious, in your experience, how much of a difference does "application wide" minification make compared to "per module" minification?

Transcrypt currently uses cascaded sourcemaps, enabling a dev to debug minified target code from the Python source.

I had not considered source maps at all. I'll have to investigate a little more how requirejs handles that.

JdeH commented 7 years ago

@RCL-Carl:

I do not know exactly what difference application-wide minfication makes, because currently Google Closure just can't do this properly. Whenever I set a higher compression level than the most basic one, it starts to make semantical errors in the minified code. But it's at least a year ago I tried. Of course it is a matter of size of the API versus size of the 'internals' of a module. I expect it's currenly only about 20%-30% for a module with source code of ca. 5kB and no external libraries, but I didn't measure this.

It becomes more interesting if Google can be config'ed on such a compression level that it will leave out unused code in an application. That should really make a big difference for e.g. a bulky library fabric.js, of which only a small part is used, but also still a considerable difference for __builtins__. My assumption is that at some point someone will indeed make a minifier that understands JS well enough to leave out "dead code" for a particular app. Two years ago there was some talk that Google Closure could do this, but in my experience it can't yet. If that happens joined minification can easily shrink a program from 150kB to 20kB, e.g. the pong demo, currently carrying the dead weight of the whole fabric.js library.

Of course fabric isn't needed for this simple demo, and of course you can download taylored versions of fabric and other libs, but dead code removal should be able to do a far better job. So, just as with convergence between JS and Python, this is a long term direction that doesn't yet pay off.

TranscryptOrg / Transcrypt

Conceptual - Thinking about modules #232