emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.62k stars 3.28k forks source link

Files loaded with emscripten_async_wget are sometimes not accessible #3746

Closed lamelizard closed 8 years ago

lamelizard commented 9 years ago

Hello. My program loads a list of files from a server using emscripten_async_wget(), and loads those files with emscripten_async_wget(), too. The difference is that, while all files are accessible during the callback-function, only the filelist remains accessible outside of the callback (in all browsers). I used EM_ASM(console.log(FS.lookupPath("/").node.contents);); to search for clues and there is a difference between Firefox: 2015-09-05 16_10_56-xubuntu - vmware player non-commercial use only and Chromium: 2015-09-05 16_12_39-search Only filelist.txt (and the folders) looks fine to me in Chromium; the underscores are missing and I cannot explain the quotation marks, either. The const char * from the callback is /gamedata/engine/engine_options.ini as it's supposed to be. I tried to open /gamedata/engine/\"engine options.ini\" which doesn’t work, as well.

kripken commented 9 years ago

The quotation marks are because it needs to be read as engine.contents["de.ini"] instead of engine.contents.de.ini, which would not work (it looks like de is an object with an ini property on it). But, then I can't explain why there are no quotation marks around filelist.txt, very strange...

Maybe there is a non-unicode character in some of the filenames?

If that's not it, please make a small standalone testcase that we can debug.

lamelizard commented 9 years ago

I found the "missing" underscores, they just weren't visible in chromium... I did check the filenames, there are only ASCII characters (/_.) so that should be fine. So this means that I cannot access those files that are correct (with the quotation marks), but the one that is incorrect… I’m sorry, but I’m quite new to this. How can I upload the testcase then it's finished?

kripken commented 9 years ago

You can use a gist on github if that is convenient, or a zip on some site like dropbox would be fine as well.

lamelizard commented 9 years ago

Well, trying to create a testcase failed... I used nearly the same code to load the files and it worked, while it doesn't in my program. I made this: https://www.dropbox.com/s/3liiz7cmlvupqhv/test.zip?dl=0, which includes a folder 'testcase' which shows how my program works and my program itself, compiled with the -g4 option. It should work in the root dir of a http-server. I don't know if it is any debugable (it depends on the kind of bug I guess). One thing I found out is that quotation marks seem to be not supposed to show up; In the testcase there are none for both the 'filelist.txt' and the 'x.txt' which was listed in the filelist and both are accessible. This would explain the fact that I can access the filelist but not the other files in my program. But still I have no idea what could be causing this.

lamelizard commented 8 years ago

I was able to narrow down the problem. It’s strange: In the function createNode from MEMFS there is the line parent.contents[name] = node;. the first time the name is “filelist.txt” and it creates filelist.txt. After loading the list it’s called with “x.txt” and it creates “x.txt”. Any ideas what could be causing this?

kripken commented 8 years ago

is there another file called "x.txt" perhaps?

./tests/runner.py browser.test_preload_file should have tests for escaping all kinds of characters. Does it pass for you?

lamelizard commented 8 years ago

No, there aren't two files named the same.

Ah, no it doesn't. But I don't know why (I'm running Xubuntu 32bit, the emsdk version is 1.34.6). This is the result: https://gist.github.com/1e68ba93fe633f44042c.git

kripken commented 8 years ago

That url gives me an error?

lamelizard commented 8 years ago

I guess I made some mistake, here's a new one: https://www.dropbox.com/s/gvgly0bibahyci7/test_result.txt?dl=0

kripken commented 8 years ago

Ok, what happens in the browser? Is some error reported in the page?

lamelizard commented 8 years ago

I updated to 1.34.9, now the test works (probably I didn't allow popups last time or something like that). I updated the https://www.dropbox.com/s/gvgly0bibahyci7/test_result.txt?dl=0. It didn't change anything for the program, though.

kripken commented 8 years ago

Ok, how do i see the problem in the dropbox download you provided?

lamelizard commented 8 years ago

I updated the zip, now there is a breakpoint. All the files and the folder have to be in the main directory of a http server. In chromium open the debugger and use Scope or watch name and FS.root.contents (and content of the subcontent). Open the html and it will stop a few times for folders like tmp. Then there will be filelist.txt (in gamedata/engine) working just fine. After that there will be some more folders which do work and some files that don't, getting quotation marks while the name looks fine to me.

kripken commented 8 years ago

I'll try to get to this later today, but to maximize the effectiveness of my time, can you please write out the complete steps to see the problem, extremely concretely? Assume I'm an idiot when doing so ;) It's stuff like

After that there will be some more folders which do work and some files that don't

that are not immediately obvious that might stall me.

lamelizard commented 8 years ago

Ah, I'm sorry, that was totally my fault, I didn't explain that part well (or at all). I set a debugger break point right before parent.contents[name] = node;. After resuming the execution often enough the filelist will be loaded (on the website itself will be a list of files) those will be loaded in the same order as in the printed list. If the directories of the filepaths not exist in FS.root.contents it will be added. (a watch expression has to be added to view this (in chromium it's on the right side of the icons to control the execution of the javascript (which can be found in the tab Sources (which can be found in more tools -> developer tools)))) Then the directories are there, the current file will be added. This process can be viewed by looking at the name at Scope (there the Watch tap is) to identify if it is a file or a folder being added at this moment (there will be no filenames without a dot) and what's being added and to see if the name is O.K. (no wierd characters or something), and then doing 1 or 2 single steps/lines in the code to trigger parent.contents[name] = node; which will add the node to FS.root.contents (most likely into FS.root.contents.gamedata.contents). If it was a file it should have the marks except for the list in FS.root.contents.gamedata.contents.engine.contents. If there is something else please ask.

lamelizard commented 8 years ago

I'm not quite sure if I found something out or not; This is a screenshot from chromium's watch expressions: 2015-09-28 15_39_02-xubuntu - vmware player non-commercial use only While the filelist can be accessed like you wrote (in this case FS.root.contents.gamedata.contents.engine.contents["filelist.txt"] (the third line from the bottem)), something like the "de.ini" can't, it says it's undefined, while I can access it by clicking on the arrow in the third line, so it can't be undefined, I guess. I'm not sure if Javascript is acting wierd or if I just made some mistake?

lamelizard commented 8 years ago

I was able to find the problem by creating a testcase the other way round; deleting everything not needed. I found out that my program works if I do not create folders. It doesn't work if I do; I'm using something like EM_ASM(FS.mkdir('/gamedata')); to create folders. I guess this is wrong? How is the "official way" to create them? If this isn't the issue I can create easier testcases since I now think to know what's causing this.

lamelizard commented 8 years ago

In the created .js there are lines like FS.mkdir('/tmp'), so I guess this can't be wrong. Still, even opening a file in the / directory doesn't work after creating directories... Furthermore I tried using EM_ASM(console.log(FS.readdir("/"))); to get more information; The result was [".", "..", "tmp", "home", "dev", "proc", "gamedata", "CMakeCache.txt "], so I guess it's just Chromium giving me those quotation marks for all not working nodes, but that switching of lines after the end of CMakeCache.txt happens after I copied that line from the console... so I guess there is something wrong with the names. This doesn't happen in my hex-editor... Firefox is giving me a space (hex: 20) at this place but not allways, here's a picture: 2015-10-01 17_03_22-fge - microsoft visual studio Any ideas? Another question: Does somebody know a project on github or somewhere else with working loading of files from a list? Any example source would be awesome.

kripken commented 8 years ago

Sorry, I've been sick and not much around.

FS.mkdir should definitely work.

Do you have a small testcase showing the issue, now that you've narrowed it down? A single small cpp file with some EM_ASM that shows the issue would be great.

lamelizard commented 8 years ago

Don't worry about it. Well the thing is I was wrong again, I made a testcase that doesn't work even without folders... I made some cases trying to figure something out. It sometimes doesn't work even with nearly the same filelist, only with x.txt\n. The only differnce; One file is 6 byte, one 7. My guess is that these are different std::endl because I made those files on different OSs (Windows and LInux) and that one byte of the 2 byte newline is bekomming part of the file name in my program. This may explain the new line that I get then I copy the EM_ASM(console.log(FS.readdir("/")));-array from chromium and the whitespace from Firefox. I will have to investigate further... Can I assume that Emscripten handles things like std::endl the linux-way?

lamelizard commented 8 years ago

It works! I set the ios_base::binary so that there are only Unix style newlines and everything loads perfectly, the Windows 0x0D before the Unix \n was the problem. I don't know if you see any problem for the filesystem since it's rather a OSs specific issue not related to Emscripten, so you may or may not want to close this. Anyway, I'm sorry for taking your time and thank you very much for your help.

kripken commented 8 years ago

I'd like to fully understand this. How did a "\r" get into a filename? For that matter, how could a "\n" get into a filename? Was it intentional to have such characters, or not?

juj commented 8 years ago

We have a documentation section on the valid character set allowed in filenames: https://github.com/kripken/emscripten/blob/master/site/source/docs/porting/files/packaging_files.rst#valid-character-set . Neither \r and \nare allowed characters. Was those characters introduced by Emscripten code somewhere, or by user code? If it was user code, perhaps we are missing some error checking somewhere that does not correctly enforce and clearly yell if \r or \n characters are attempted.

lamelizard commented 8 years ago

I'm sorry, I wrote it in a confusing way, there is no \n in the filename. Since this is a cross platform project I just used the filelist from windows with \r\n (the filelist has one line per file). My program loads the file as an ifstream and uses getline() to read it into a string which then is used in emscripten_async_wget(). The ifstream seems to act in Emscripten like in Unix, therefore \r is not part of the end of the line but rather part of the line itself and with that part of the call to emscripten_async_wget() which seems to create a node using filename\r. But I always tried to access filename so this didn't work (it probably cannot work anyway since \r isn't allowed (Thank you for that list juj)). (this all is just an assumption but the only explanation I have)

So no it wasn't intentional at all, I didn't even know that it's there and no, the \rs there not from Emscripten but from a bug in my program.

kripken commented 8 years ago

I see, thanks. Looks like there's nothing we can do to make this kind of mistake less likely, except for to document somewhere that we have POSIX-like conventions in line endings etc. But not sure where we'd do that.