arduino / arduino-cli

Arduino command line tool
https://arduino.github.io/arduino-cli/latest/
GNU General Public License v3.0
4.37k stars 385 forks source link

Allow Libraries to Extend Include Path #501

Open oclyke opened 4 years ago

oclyke commented 4 years ago

It would greatly benefit library developers if libraries could specify additions to the include path used in their compilation. Developers could then create work that is much easier to maintain.

Problem

In my personal use case I would like to write both a portable C driver for a device (that can be used on other platforms and with minimal overhead) and an Arduino wrapper for that interface that makes it simple to use for beginners. For the purpose of illustration it would be best to think of the portable driver as immutable source code obtained from a third-party. For maintainability the third-party code would be included 'symbolically' (e.g. git submodule). A representative library structure might be:

libraries/Servo/library.properties
libraries/Servo/keywords.txt
libraries/Servo/src
libraries/Servo/src/Servo.h # this is the Arduino wrapper interface
libraries/Servo/src/Servo.c
libraries/Servo/src/open-source-servo-driver-lib/include/ossdl.h # this is the portable open-source interface
libraries/Servo/src/open-source-servo-driver-lib/src/ossdl.c

The file ossdl.c will most likely include the open-source portable interface with a line such as: #include "ossdl.h" -- unfortunately this will not work in Arduino 1.8.10

The workaround(s) is(are) not great:

Proposed Solution

I suggest that library.properties gains a new comma-separated field of extensions to the library's include path (relative to Servo/src). If not present (as in the case of existing libraries) only Servo/src will appear in the search path - keeping compatibility with the existing codebase. For the sake of interface protection this is as good as the current technique because users would have to modify library.properties to get access beyond the developer's original intent. Of course if they are modifying the library they don't need this feature to accomplish what they seek.

To fix the example situation from above the developer would add the following entry to his library.properties file:

include=open-source-servo-driver-lib/include

Of course it would also be valid to use

include=lib1/include,lib2/include,lib2/src

since commas are not (typically) used in filepaths. If needed another format (or use of escape characters) could be employed.

Other Possible Solutions (thoughts?)

Related Issues / Feature Requests

arduino/arduino-builder/issues/15 - A long-debated feature in which it is requested to put the Sketch folder on the include path for libraries. This is not the same request because

per1234 commented 4 years ago

Please explain why you consider the existing solution of using relative paths to be "not great".

oclyke commented 4 years ago

Relative paths themselves are not the problem (surely they are used in many projects out there) but rather the fact that only the src/ directory can be used as the reference for those relative paths.

The goal is to be able to include other open-source projects in Arduino libraries without modification - thereby greatly lightening the burden of maintenance.

Lets imagine that there are two great projects that should be included in an Arduino library. Their structure is not flat and header files may be located in one or more sub-levels. Each project might have a structure similar to this:

- projN
    - subA
        - headA.h
    - subB
        - headB.h
    - projN.h       (#include "headA.h" and #include "headB.h")
    - projN.cpp

AFAIK there is no way to use this project without modifying it - even if you placed the contents of projN under the library's src/ directory the Arduino build system could not find headA.h or headB.h. The source code of the (third-party) project would need to be modified to include headers via relative paths.

Maybe it is best practice for a project like projN to use relative paths internally so that only one include path is needed in the build system but unfortunately we can't enforce that on third-party projects. So the alternative is to maintain a modified fork/copy or to allow additional include paths to be added to the build system.

Is there another way that I am overlooking?

facchinm commented 4 years ago

Hi @oclyke , I indeed like the idea of allowing a library to specify additional include paths; it should also be totally safe (in fact, we are only recursively compiling inside the src folder, not outside). My proposal:

I'll try to prototype something and let you know :wink:

oclyke commented 4 years ago

Wow fantastic! Can't wait to test drive. I've never worked on the IDE source code but let me know if there's anything I can do to help.

facchinm commented 4 years ago

@oclyke sorry for the delay. I'm attaching the builds of arduino-builder for the most common platforms to https://github.com/arduino/arduino-cli/pull/502

matthijskooijman commented 4 years ago

the additional include paths shouldn't be used during library discovery to avoid polluting the search results but only later in the process.

What does this mean exactly? The environment for include detection and compilation should be as identical as possible, otherwise you might end up with different results and missing includes during include detection?

Do you maybe mean that these additional include paths should only be used when compiling source files from the library itself? If so, the include path should be added only for this library's source files, when compiling and when doing include detection I think. However, I'm not so sure this would really work: If the sketch includes a library header file that wants to include a header file from the additional include path, this would break. Because of this, in practice you pretty much always need a single, global, list of include paths used in all compilations.

The file ossdl.c will most likely include the open-source portable interface with a line such as: #include "ossdl.h" -- unfortunately this will not work in Arduino 1.8.10

@oclyke Are you sure this will not work in Arduino? When you include with "" rather than <>, the compiler should resolve the include relative to the current file, so you should be able to include from the current directory.

This does not solve the more complex problem when you have two immutable sublibraries of which one needs to include the other, so the additional include paths can still be useful.

On a related note: Maybe it would be interesting to also have an option to not put the src directory in the include path at all. This would allow libraries with a separate include directory for their public interface, and still allow "private" headers in their src/ directory that can be included with relative includes (with "").

facchinm commented 4 years ago

With avoid polluting the search space I was thinking about this usecase (let's call it "Arduino style explicit library request" )

#include "foo.h"
#include "bar.h"
void setup() {}

Then, foo.h is discovered in Foo library with the following structure

.
|-- keywords.txt
|-- library.properties
`-- src
    |-- foo.cpp
    |-- foo.h
    `-- libbar
        |-- include
        |   |-- another_include.h
        |   `-- bar.h
        `-- src
            `-- bar.c

Foo library adds libbar/include to additional_include_paths and this stops the library discovery there. Since the user wanted to add bar.h from Bar library (in sketchbook) the compilation is now broken.

Now, there are a couple of real life things to keep in mind:

Hope it's clearer now, but I'd love this thing to be ready soon and having a larger audience can help in finding the ideal solution.

matthijskooijman commented 4 years ago

@facchinm I'm a bit confused with your example. The sketch includes "bar.h", the quotes indicate that this file exists in the sketch? If so, there is no conflict there, as the quotes make sure that the current directory is searched first, so the file from the sketch is found. If they wanted to include bar.h from the Bar library, it would have been better to write #include <bar.h> instead. In any case, if the sketch contains no bar.h, then either one has the same result.

Furthermore, you seem to suggest a conflict between bar.h in the Bar library and the include in libbar, but in your example you only have a libbar.h, so again, no conflict there?

Assuming that you intended no bar.h in the sketch, and intended libbar.h to be named bar.h, then indeed, there is a conflict between libbar/include/bar.h and Bar/bar.h, but I don't think there is a way to solve this.

Your suggestion seems to be to only put libbar/include on the include path when compiling foo.cpp and libbar/src/bar.cpp? If so, this would break if the sketch includes foo.h, and that does #include <bar.h> (intending to include libbar/include/bar.h). This could of course be solved by doing a relative include (#include "libbar/include/bar.h") instead (which is a good idea in any case, since it disambiguates), but does not solve everything. Consider the following library:

.
|-- keywords.txt
|-- library.properties
`-- src
    |-- foo.cpp
    |-- foo.h
    `-- libbar
        |-- include
        |   `-- bar.h
        `-- src
            `-- bar.c
    `-- libbaz
        |-- include
        |   `-- baz.h
        `-- src
            `-- baz.c

Now, suppose that libbar depends on libbaz and libbar/include/bar.h contains #include <baz.h>. If the sketch is compiled without additional library paths, and it includes foo.h which includes libbar/include/bar.h which tries to include <baz.h>, which is not in the include path and fails. Since libbar/include/bar.h is immutable, we cannot fix this using a relative include.

I guess this means the perfect solution does not exist (the problem, really, is that "includes" in C/C++ inhabit a global namespace and can cause conflicts). The traditional way to solve this is to do have a single (or a few) include dir instead of one per library and doing things like #include <libbar/bar.h> where the libbar part provides some namespacing, but that seems infeasible with the way Arduino works right now (in fact, it is actively prevented, since library autodetection does not work on files in subdirectories, so you cannot even emulate this behaviour with a subdirectory inside src currently).

Given there will be some limitations either way, I think I would prefer to keep the include namespace simple and flat, and let people deal with duplicate header files (which then need to do already, if multiple libraries provide a bar.h file, they might run into trouble), rather than letting them deal with inconsistent results for inclusion ("it works when I include file from my library .c file but not from the .h file, huh?") and not supporting multiple interdependent sublibraries.

facchinm commented 4 years ago

Sorry for the libbar.h <-> bar.h confusion, I amended the example.

The main topic was "Arduino style explicit library request" ; we never forced the user to use absolute includes (<>) and most examples use relative includes for libraries.

Said that, at compile time the additional include paths must be added (and my PR does exactly that). At library discover time, your example indeed breaks the pattern; anyway, I'm quite sure that the person who wrote the library knows what he's doing, so when the lib discovery engine finds a "missing" header inside a folder specified by additional_include_paths it could simply bail out (reporting the base library as successfully found).

oclyke commented 4 years ago

@matthijskooijman

On the main topic (where to include additional_indlude_paths in the build)

There's a lot to keep up with between all these posts but I'd like to submit my two cents.

"avoid polluting the search space" What does this mean exactly? The environment for include detection and compilation should be as identical as possible, otherwise you might end up with different results and missing includes during include detection?

Originally I assumed that this was because the library for foo.h was already 'discovered' (all the library.properties info had been located) the builder would not want to re-locate the same library for all its additional includes (because what's the point of that?)

You two have brought up consideration of what should happen when a "Library" (meaning one installed via Library manager) and some user files in the Sketch provide conflicting files. I think you two have covered that pretty well and I agree with the statement:

I would prefer to keep the include namespace simple and flat, and let people deal with duplicate header files

Here are a few little snippets I want to echo / emphasize:

Because of this, in practice you pretty much always need a single, global, list of include paths used in all compilations.

Yes. Let's not add confusion / errors by changing which paths we provide in different contexts.

if all libraries were written with relative includes, this would work out of the box.

Maybe I am missing something here? What if there is some immutable code that is written with relative includes that you then include in a subdirectory of your library? (You are prevented from using a flat file structure because you want to use a git submodule, in this example) Maybe I just don't know how proper relative includes are written?

When you include with "" rather than <>, the compiler should resolve the include relative to the current file, so you should be able to include from the current directory.

What, though, is 'the current directory' in this case? From my understanding the compiler is operating out of just one directory (which is fairly well hidden - maybe in the installation or the temporary build directory?) and so requires instructions on where to find header files.

I'd love this thing to be ready soon and having a larger audience can help in finding the ideal solution.

Me too! Wondering what exactly you mean though. Does this mean add as a feature to an upcoming release of Arduino and allow users/developers to try it out at large scale? Or does it mean releasing it in a beta build and / or discussing on the developer's email list?

About public vs. private APIs

Yes - to allow #include "foo.h" to operate properly in the main sketch means that #include "bar.h" will work and provide access to the Bar public API. If the Foo author wants to restrict this then an acceptable way, in my opinion, is to never reveal the existence of "bar.h" in any examples. (I.e. their examples should be constrained to the API that they want to allow) If an intrepid user digs around then they may find plenty of ways of breaking the API, including "bar.h" being just one.

matthijskooijman commented 4 years ago

When you include with "" rather than <>, the compiler should resolve the include relative to the current file, so you should be able to include from the current directory.

What, though, is 'the current directory' in this case? From my understanding the compiler is operating out of just one directory (which is fairly well hidden - maybe in the installation or the temporary build directory?) and so requires instructions on where to find header files.

Includes are done relative to the current file, that is the file that contains the include directive. The "current directory" (in the operating-system sense) is not relevant here, only the directory that contains the file that contains the include. So it does not matter where a file is included from, relative includes are always evaluated in the same way.

What can be a bit confusing is that an include with "" will fall also look in the global include path if no relative file was found, which is why a lot of people use "" when they meant <>.

Originally I assumed that this was because the library for foo.h was already 'discovered' (all the library.properties info had been located) the builder would not want to re-locate the same library for all its additional includes (because what's the point of that?)

I'm not entirely sure what you're saying, but I can imagine that we would want to treat include detection specially to e.g. prevent #include <bar.h> from adding the Foo library to the build (since it does provide bar.h, but only as an internal API). However, for that, it is sufficient to simply not consider the additional include paths when looking for a missing include (in this case, since the Bar library would provide bar.h at the top level, that one would be chosen. Note that it is still important to add the additional include paths to the compiler commandline as soon as you decided to add Foo to the build, since otherwise a next run of the preprocessor might now locate foo.h, but miss bar.h included from foo.h, which is intended to find libbar/include/bar.h, but instead will end up pulling in the Bar library.

One caveat: If includes in the additional include dirs are not considered for fulfilling missing includes (only added to the path once a library is selected), then that also prevents separating the header files from the src files with a "do not put src/ on the include path" option. OTOH, behaviour could maybe change based on such an option, of course, so maybe not something to influence this decision.

matthijskooijman commented 4 years ago

W00ps, missed @facchinm's last post when I wrote my previous reply.

Said that, at compile time the additional include paths must be added (and my PR does exactly that). At library discover time, your example indeed breaks the pattern; anyway, I'm quite sure that the person who wrote the library knows what he's doing, so when the lib discovery engine finds a "missing" header inside a folder specified by additional_include_paths it could simply bail out (reporting the base library as successfully found).

I'm not sure what you're proposing here exactly, but as suggested in the next-to-last paragraph of my last post, it is important to add the additional paths to the preprocessor commandline once a library is added to the build, otherwise includes of "private" header files on the additional paths might result in missing includes. I think this also the situation you talk about, but I'm not sure how you're proposing to support this (bailing out and halting further include detection makes no sense, since that could mask later missing includes).

as-iotex commented 3 years ago

Is this expected to be merged soon? We are developing an Arduino library which uses 3rd party modules The fact that we cannot specify additional include paths is forcing us to modify all the include statements in the 3rd party files This would be really useful

vChavezB commented 2 years ago

I am also interested to know if this feature is still on the roadmap of the Arduino project? Most building systems/IDE's provide an option to include additional paths to compile source files. Internally the build system then parses the options and tells the compiler of the paths (e.g. -I for GCC).

The only workaround I found is adding a new entry to the boards.txt of the specific SDK I am going to work with (e.g. Due,ESP32,etc). This is really hacky and does not expand/is not flexible.

The disadvantage of the current situation is that each time I have includes in another folder that is not src I have to use relative paths. And as the previous comment, for 3rd party modules that you want to port to Arduino, you have to modify the library directly and use relative paths...

vChavezB commented 2 years ago

Well, it seems that this issue will not be solved in the short term. I have developed a script that easily replaces include directives to relative ones. This should reduce the time it takes to port a 3rd party library to the Arduino ecosystem.

Links

Based on this script I easily ported the lwIP TCP/IP stack into Arduino.