emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.82k stars 3.31k forks source link

Lazy loading appears not to be working #573

Closed ysangkok closed 10 years ago

ysangkok commented 12 years ago

my prejs.js:

if (!Module) Module = {};
Module["arguments"] = ["/test.c"];
Module["preInit"] = function() {
    FS.createLazyFile('/', "test.c", "test.c", true, false);

test.c (just a cat clone):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc,char* argv[]){
        FILE* fp;
        int i=0;
        char c;
        if(argc==1){
                fprintf(stderr,"Error using cat...");
                exit(2);
        }
        if(!strcmp(argv[1],"-w")){  /* This returns 1 if match is found */

                if(argc!=3){
                        fprintf(stderr,"Invalid no. of arguments");
                        exit(2);
                }
                fp=fopen(argv[2],"w");
                if(fp==NULL)
                {
                        fprintf(stderr,"Error in opening file...");
                        exit(2);
                }
                c=getchar();
                while(c!=EOF)
                {
                        fputc(c,fp);
                        c=getchar();
                }
                fclose(fp);
        }
        else{
        for(--argc;argc>0;argc--){
                 fp=fopen(argv[++i],"r");
                 if(fp==NULL){
                        fprintf(stderr,"File not found...Aborting");
                        exit(2);
                 }
                 while((c=fgetc(fp))!=EOF)
                        fputc(c,stdout);
                 fclose(fp);
        }
        }

}

build command : ~/emscripten/emcc --pre-js prejs.js test.c -o test.html

error encountered: stream.object.contents undefined in library.js: https://github.com/kripken/emscripten/blob/master/src/library.js#L1630

fix proposed: why not make the createLazyFile take a callback? that way, i can handle my own file downloading.

i have made my own patch that makes it work, but it is not using the other code at all, and i thought it was supposed to work, so it is probably a simple fix somewhere and not a reimplementation like i have done.

my patch:

diff --git a/src/library.js b/src/library.js
index 798a6f5..aa89e80 100644
--- a/src/library.js
+++ b/src/library.js
@@ -1628,6 +1628,37 @@ LibraryManager.library = {
         bytesRead++;
       }
       var contents = stream.object.contents;
+      if (typeof contents === "undefined") {
+        var dobj = stream.object.url;
+        if (typeof dobj === "undefined") throw new Error("no content but no url object! programmer error");
+        if (typeof dobj.type === "undefined") throw new Error("need 'type' property on url object!");
+        if (dobj.type === "url") {
+          var request = new XMLHttpRequest();
+          if (typeof dobj.url === "undefined") throw new Error("type 'url' specified but no url present in url obj!");
+          request.open('GET', dobj.url, false);
+          request.send();
+ 
+          if (request.status !== 200)
+            throw new Error("couldn't fetch " + url);
+          contents = request.responseText;
+          var raw_to_uint8array = intArrayFromString;
+          var b64_to_uint8array = function(base64) {
+            return raw_to_uint8array(window.atob(base64));
+          }
+          if (typeof dobj.isNotBase64 === "undefined") throw new Error("need isNotBase64 property on object!");
+          if (!dobj.isNotBase64) {
+            contents=b64_to_uint8array(contents);
+          } else {
+            contents=raw_to_uint8array(contents);
+          }
+        } else if (dobj.type === "function") {
+          if (!dobj.fun) throw new Error("url object property 'fun' undefined!");
+          if (!(dobj.fun instanceof Function)) throw new Error("url object property 'fun' was supposed to be a function!");
+          contents = dobj.fun();
+        } else {
+          throw new Error("unknown remote object type: " + dobj.type);
+        }
+      }
       var size = Math.min(contents.length - offset, nbyte);
       for (var i = 0; i < size; i++) {
         {{{ makeSetValue('buf', 'i', 'contents[offset + i]', 'i8') }}}
kripken commented 12 years ago

The problem is that lazy loading no longer works since browsers disabled sync binary xhrs. So you need to either preload or use emscripten_async_wget. There should have been an error thrown for you when you hit this, but I see there was a bug in reporting the error - I'll fix that.

I like the idea of adding an optional callback to createLazyFile. Pull request would be very welcome.

ysangkok commented 12 years ago

Yes, I know that setting the responseType flag with synchronous XHR's won't work. That's why the prototype patch here uses base-64. Of course it would be more ideal to use base-128 or base-91. But all the base encoding obviously needs server support and it's complex and confusing. That's why I think it should just be a callback for the lazy loading, so we don't have to worry about it. But if you really think there should be the option of passing both, we could use a object with a type parameter like in the prototype above.

library.js suggests that webworkers can still do binary XHR. The specification doesn't say anything about webworkers. Even if binary synchronous support is really there for webworkers, I still think we should only support a callback.

If a major patch is written to make everything support callbacks, could we make it support chunks? It would be much easier to port many games to Emscripten if chunks could be lazily loaded. If all the loading happens in read() like in my prototype, I guess it would be doable, since read() also has the offset. But since this code is probably to be somewhere out of read(), the offset wouldn't be available when i.e. statting a file, and some kind of metadata would be needed to know the total (original) file size seperately or some thing like this: size = (number_of_full_chunks * max_chunk_size) + (size_of_last_chunk_with_size_less-equal-max_chunk_size).

kripken commented 12 years ago

By "just a callback", do you mean just a callback instead of a URL?

We can have the same parameter be either a URL or a callback function. And it would need to be a function in most cases in html of course (but the url could work in the shell etc.).

Can you elaborate more on what you mean by chunks here, and what we would need to do to support that? I'm not familiar with that stuff.

ghost commented 12 years ago

Just a quick thought - is preloading not a common problem? I wonder whether it is sensible for you to try to implement/sort it out in Emscripten - or - rather include a library (preload.js perhaps!?), which already handles all the cruft!?

ysangkok commented 12 years ago

@LCID-Fire : Preloading is already there and working. I do not understand what you mean. This issue is about lazy loading.

@kripken : Yes, I meant a callback instead of a URL.

By chunks, I was imagining HTTP Ranges aka Byte serving. There are so many games that use large packed data files. For example, OpenRedAlert which uses SDL should be easily portable to Emscripten, but it uses data files that are hundreds of megabytes. As it is originally a DOS game, it has rather modest system requirements. Relevant requirement: 2x CD-ROM drive. How fast is 2x? 300 KiB/s! So Red Alert 1 is playable over the internet. This is what I image games doing upon launching: Open archive file. fseek to some kind of TOC. Read TOC, find out where main menu resources are located. fseek to main menu resources (may be located 400 MB into the file!). Read a couple of megabytes... You get the idea.

By hosting the archive files on a remote server that supports byte serving and allows fetching of those files (using CORS), they would be lazily loadable via XHR. This is what I imagine the game doing:

  1. Game action: stat the file to get size. Emscripten action: HEAD the file over HTTP (to get Content-Length). Cache of course
  2. Game action: fseek. Emscripten action: No networking ,only internal pointer setting
  3. Game action: fread. Emscripten action: Check if bytes are already loaded (we could use JavaScript arrays. They are sparse). If not loaded, request range over HTTP. Games probably use buffer sizes that are too small, so I think we should read like 250 KB.
  4. Game action: mmap. Emscripten action: I do not think (late) DOS games use mmap.
kripken commented 12 years ago

Ah, that does sound nice. If it's easy to do with xhrs then should not be hard to add.

ysangkok commented 12 years ago

Yes, it is simply a matter of doing xhr.setRequestHeader("Range", "bytes=0-3200" );

kripken commented 12 years ago

Sounds good. Like I said before, pull request would be very welcome if someone wants to work on that. Let me know if any help is needed.

ysangkok commented 12 years ago

Do you know why it's not working now? It would be easier to add new features if it was working.

kripken commented 12 years ago

Lazy-loading in the browser? It isn't working because as discussed before it can't work, we need binary sync xhrs and browsers disabled those. Lazy loading requires sync, unlike preloading (which finishes before the program starts, so the program's sync file operations will be ok).

kripken commented 12 years ago

(Re-reading this bug, I think I might be misunderstanding something...)

ysangkok commented 12 years ago

Even though we can't have binary sync xhr's, I still think it should be possible to use the synchronous non-binary xhr's. The manual can point out that it doesn't work with binary. Also, Emscripten could help base-128 decoding the content (it seems to be the most efficient encoding), or we could use base-91. There are probably no JavaScript decoders for either of them, but at least there is code for base-91 which could get ported. Encoding the content complicates the chunking, but a small command line tool could be made that creates a TOC which maps byte-offsets to offsets in the base-encoded file.

But what do you think about the object format in the prototype I posted? Would it be okay just to transform the URL parameter to an object with the properties above?

kripken commented 12 years ago

I am a little worried about supporting non-binary sync xhrs. It means the user must do something on the server, and it's not as efficient. If we could detect data is not properly encoded and give an error, that would be better - could we? (I'm not that familiar with how encoding/decoding works and what can be confused as what.)

By prototype do you mean what appears under "my patch"?

ysangkok commented 12 years ago

@kripken: We could use our own HTTP header which should be present if the data is base64. If the header is missing, the error would be given. I think it would be necessary since there exists non-base64'ed data that is also valid base64'ed data, which means we would need to wrap everything in a JSON object or so if we didn't use a header. Since most people would use a server-side script for base64 encoding, that script could also set the relevant header.

Example PHP script:

<?php
header("Content-encoding: emscripten-base64");
header("Content-type: application/octet-stream");
echo base64_encode("\x02\x00\x50");
?>

And yes, I meant what appears under "my patch".

ysangkok commented 12 years ago

I fixed the binary lazy loading in web workers. There is no reason why that shouldn't be supported, since no hacks are necessary.

Tell me if you think the base64 stuff should be implemented, to fix lazy loading outside web workers.

I am currently working on Byte Serving, which I understand you accepted. To begin with, I'll just make it support binary sync XHR's in Web Workers exclusively.

The last issue is the support for callbacks (which I also regard as accepted). I think this shouldn't be hard to implement, so I'm saving this for later.

EDIT: I now finished Byte Serving, that pull request is also including the binary lazy loading fix.

kripken commented 12 years ago

Binary in workers is great, no downsides there, thanks for the pull request. Byte serving and callbacks are good too as discussed above.

I am still somewhat worried about doing something that requires server-side support, like sync xhrs in the main thread. But if we document this properly and show a clear error when the typical mistake is made (someone tries to do a normal sync load in the main thread, without server-side support) then this is ok, pull requests welcome. We should provide some simple way to pre-process files on the server for this, a python script for example would be nice.

Btw sorry for slow response times over the last week, I've been at a conference (gdc online).

kripken commented 10 years ago

Superceded by #2047