alashworth / test-issue-import

0 stars 0 forks source link

stan::lang::compile should not require access to the filesystem (for #include) #152

Open alashworth opened 5 years ago

alashworth commented 5 years ago

Issue by ariddell Sunday Jul 09, 2017 at 17:57 GMT Originally opened as https://github.com/stan-dev/stan/issues/2351


Summary:

stan::lang::compile cannot be used by interfaces which do not have access to the filesystem when the stan program contains #include <filename>.

Description:

There should be a way to use stan::lang::compile with stan code which contains #includes. Naturally all interfaces which do not have a (full) file system available (e.g., httpstan, jupyter notebook) will need this. But having Stan C++ reach into the filesystem breaks the isolation between Stan and the interfaces.

Separating out preprocessing and compilation might do the trick. Most compilers follow this strategy, I assume.

Another option would be to allow the interface to provide (<filename>, <contents>) pairs to stan::lang::compile.

Reproducible Steps:

Call stan::lang::compile without access to the filesystem.

Current Version:

v2.16.0

Edit: add another example of a "filesystem-less" setup.

alashworth commented 5 years ago

Comment by bob-carpenter Sunday Jul 09, 2017 at 21:01 GMT


How about passing the parser a functor that implements

std::string operator()(const std::string& include_path) const;

to map an include file name to its contents?

I need the parser to be able to keep track of line numbering, so it's tricky to separate the preprocessing because we need a record of the includes. I do separate them somewhat inside the parser itself. Then I give warning messages based on what comes after #include in the Stan file.

How do you see the HTTP app taking included files?

alashworth commented 5 years ago

Comment by ariddell Sunday Jul 09, 2017 at 21:45 GMT


I'm happy with the functor. (I hadn't thought about line numbers.)

How do you see the HTTP app taking included files?

I can see the user having to provide the contents of to-be-included files as a filename->contents mapping encoded as JSON.

alashworth commented 5 years ago

Comment by ariddell Tuesday Feb 20, 2018 at 14:04 GMT


I still think this is an excellent idea. Stan C++ shouldn't reach into the filesystem because filesystems vary too much by platform. And there are some settings where filesystems don't really exist or cannot be written to (virtual machines, containers, ...).

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Feb 20, 2018 at 14:09 GMT


I'd be surprised if virtual machines or things like Docker containers can't mimic file systems.

Stan doesn't care where the bits come from, just that it can associate a name to content.

alashworth commented 5 years ago

Comment by ariddell Tuesday Feb 20, 2018 at 14:57 GMT


The real problem is finding out the filename. Filesystem-less settings can mimic filesystems, but they still have to know what to name the filename referenced by an #include statement. So this requires parsing the stan program code to find the #include statement. This is not something which an interface should have to do.

Can we compromise and have stanc return the filenames if they are not found? That way the interface could call stan compile once, find out what the filenames are that are missing, look for them in the include paths, and then call compile again?

alashworth commented 5 years ago

Comment by ariddell Tuesday Feb 20, 2018 at 15:14 GMT


@ahartikainen is asking a good question about whether or not the current approach is cross-platform robust. https://github.com/stan-dev/pystan/pull/438#issuecomment-367002821

alashworth commented 5 years ago

Comment by bob-carpenter Tuesday Feb 20, 2018 at 18:13 GMT


Let me try to be more concrete about what needs to happen.

From the top level, the stanc executable runs the compiler defined in:

This needs to be generalized so that rather than taking a std::vector<string> of include paths, it takes a functor which maps strings to contents (also a string). This new object will require an implementation based on include paths in the file system to replace current functionality.

Then, the function it calls to define the underlying program string, is io::program_reader, which is defined in:

This needs to be modified to accept that callback function and use that rather than using the file system. The bits in there that used the file system go into the implementation of the callback function to implement the current filesystem based approach.