github / markup

Determines which markup library to use to render a content file (e.g. README) on GitHub
MIT License
5.86k stars 3.39k forks source link

Asciidoctor: support include directives for other asciidoc files #1095

Open miltador opened 7 years ago

miltador commented 7 years ago

A lot of time has passed since latest comments from devs in #172 and #335. I think after years things have changed in your infrastructure, why not to bring up this issue again with some more input and news?

For example, AFAIK GitHub started to use containers which could help to isolate things from unintentional access and improve security.

Also take a look at asciidoctor/asciidoctor#1088. The author of Asciidoctor is ready for a conversation about this, there is even proposed a way to implement this with custom include preprocessor. If there are still some concerns about resolving the issue, please provide a constructive feedback so both sides can collaborate.

chevdor commented 7 years ago

I totally šŸ‘ this request. Includes are an amazing feature of Asciidoc over Markdown and would help a lot providing users with good and up-to-date documentation.

I would add image to the same bucket. I think it is not supported today, probably for the same reason includes are not.

mojavelinux commented 7 years ago

I would add image to the same bucket. I think it is not supported today, probably for the same reason includes are not.

The image macros are supported on GitHub. You can see them in action here: https://github.com/asciidoctor/atom-asciidoc-preview

mojavelinux commented 7 years ago

If GitHub is committed to enabling includes for AsciiDoc files, I'm willing to do whatever needs to be done in Asciidoctor to make it happen. I've already added an extension point so the include directive can be handled by a custom function. That allows the GitHub code to resolve the file from the git repository instead of from the file system.

ericis commented 6 years ago

Fully support this. Seems like a basic capability and we're struggling without it šŸ‘Ž http://asciidoctor.org/docs/user-manual/#include-directive

nailgilaziev commented 6 years ago

it will be great to use a full power of asciidoc from github frontend. For example I use restdocs & spring to generate snippets for documentation of my API. Most of documentations is handwritten and only generated snippets was included. I want to store this documentation in github and collaboratively edit it with others via github instruments(pull requests).

kavaliro commented 6 years ago

This issue was an unpleasant surprise. I agree with @ericis; this is fundamental functionality which is expected by Asciidoc users.

Since @mojavelinux has enabled custom handling , this seems simple:

  1. check for infinite recursion
  2. only allow relative includes. (i.e., only include files within the repository.)

This seems fairly solvable. Maybe I'm not seeing the full extent of it?

denuno commented 6 years ago

I think it's been a solvable problem for a while now, technically-- at this point it's just someone flipping a couple switches by the looks of it.

This is hands down one of the best features of AsciiDoc and it's a crying shame that there is so little support for it in git-related tooling (GH, Gollum, GitBookIO, etc.).

FWIW, it's not like images are that much safer than other types of content, but for years now I could generate images of my source code and include it that way (ridiculous, as no copy/paste), or generate the content itself and include it that way (which defeats the purpose of nice rendering via GH out of the box).

Someday, it would be nice to move into the brave new future and get some include::sourceFile[tags=tagName] type stuff going... I know not many people like writing docs but this stuff is gold for folks who-- well, may not love writing them, but at least who want to write them as smartly as possible-- and right now those folks have to do the work twice to get it to look nice, basically, and while it can be automated... it shouldn't need to be, in this day and age! :smiley:

obilodeau commented 6 years ago

So it's pretty clear the reason it hasn't been implemented yet is security. I believe it is also a reason why GitHub's staff doesn't really engage with the community in order to solve that problem because it would require them to expose their threat model and pieces of their architecture which could help attackers.

FWIW, it's not like images are that much safer than other types of content, but for years now I could generate images of my source code and include it that way (ridiculous, as no copy/paste), or generate the content itself and include it that way (which defeats the purpose of nice rendering via GH out of the box).

You are missing the broader point. Include processing is more complex, not because the destination is dangerous, but because of the way it interacts. Just as an example: An include can include includes which could include the first top-level include. This would DoS a naive parser. Analogous to the Billion Laugh Attack and it could be done in so many ways that it's hard to prevent them all with parser restrictions. Another different, yet complex to fix, example: AsciiDoc includes can be URIs, URI includes could be used to perform SSRF attacks in order to explore GitHub's infrastructure in means that shouldn't be allowed.

So I think the reason why GitHub is not doing it is because <1% of their users care and there are numerous ways to abuse the feature so doing it safely is hard. Implementing a hard feature for <1% is not worth the resources (or the risk).

Don't shoot the messenger here: I want AsciiDoc includes to work on GitHub! It's just that I can see why it's a tricky one to implement.

denuno commented 6 years ago

Nothing wrong with sharing how you think-- no shots here, but I disagree that it's security holding this up per se. :smiley:

Mainly because in the comments above, both injection attacks, and recursion have solutions (and have been solved elsewhere in github infrastructure already)... it's weird that infinite recursion is even on the list, IMHO-- same with the "naive" parser idea... anyone can write a bad parser, we can't protect against that (and I'd be scared if we could!).

Same goes for the security concerns you bring up. They've been brought up, and even addressed, already. If not in this thread/ticket, than in multiple other ones, which leads me to my last refutation, that being the idea this problem affects a negligible number of people.

It seems to be effecting many people (including authors of popular tools who are representing thousands of people), as you can tell by the various tickets around this issue. The same-ish issue (for MD) has the highest number of comments at the moment, and it's just a bump of several other tickets (all of which seem to have "solved" the problems raised but appear to be in a holding pattern). Plus figure a good 85% of people probably don't file a ticket when they hit a problem...

I'll close by saying that security through obscurity isn't really much protection. There's a reason crypto libs have to be published to be valid, and that's because obscurity is "soft" protection at best, and a huge security concern at worst. It's cool GH is sharing the markup plugin-- tho note numbers 2-5 from the docs ("internal code") seem to be a source of bottlenecks.

Pretty sure The Future will be mostly folks who put source code out so it can be verified/checked (been saying this for years, but at least it doesn't sound crazy anymore... how much does OSS power these days?), and if you don't think someone can just call up their friend who works at Large Company X to get a peek at sources... ĀÆ_(惄)_/ĀÆ anyhow, at this point it's not a tech barrier as far as I can see, and it's fine if something else is holding this up, but there's been little to no feedback from project maintainers, and it's depressing to see so much input and so many orphaned/ghosted PRs going back years and whatnot (in general around this issue). But not that much. :smiley: Mainly I'm just tossing in another +1, in a now age-old tradition. :stuck_out_tongue_winking_eye:

mojavelinux commented 6 years ago

@obilodeau Thank you for sharing your insight on this issue. You provided a lot of context for what the security concern actually entails.

I want to emphasize two things. First, Asciidoctor has from the very beginning offered up the include processor as an extension point (for GitHub, I might add). That means that GitHub could write code that takes over processing of the include, so there's 0 risk of insecure code that they themselves did not introduce. That doesn't make it easy, to be sure, but that should mitigate the third-party risk argument.

Second, I'm willing to make any reasonable change in Asciidoctor that would further accommodate this feature. So far, I have heard absolutely nothing from GitHub on this issue, which doesn't bode well if we expect to make any progress. My offer stands to work with GitHub if they need me to get this feature implemented. (code, documentation, whatever)

Olivier wrote: An include can include includes which could include the first top-level include.

I don't expect GitHub to use the built-in include processor, but there is protection against this scenario. An include can only go to a fixed number of levels before it's terminated. Most users would be happy with 3-5 levels, if even that.

Olivier wrote: AsciiDoc includes can be URIs, URI includes could be used to perform SSRF attacks in order to explore GitHub's infrastructure in means that shouldn't be allowed.

I would not except these to work on GitHub. They don't even work on Asciidoctor without a command-line / API flag. I think includes that work within the set of repository files is already a huge boon for us AsciiDoc writers.

Olivier wrote: Implementing a hard feature for <1% is not worth the resources (or the risk).

If they went with that argument, then I'd say they don't know the AsciiDoc community very well. AsciiDoc users are bonkers for includes, rightfully so. As Denny points out:

It seems to be effecting many people (including authors of popular tools who are representing thousands of people), as you can tell by the various tickets around this issue.

If you want my prediction on this issue, it will be resolved as soon as GitLab implements it. That's not trolling, that's just sound competition in the marketplace. After all, GitLab just implemented stem support (i.e, math expressions) in AsciiDoc files.

mojavelinux commented 6 years ago

Btw, here's a link to corresponding issue in the GitLab issue tracker.

https://gitlab.com/gitlab-org/gitlab-ce/issues/18045

Interesting to note that @jirutka has already submitted a full patch for a custom include processor that works with a repository manager like GitHub and GitLab. So the code is there. We just need the will.

jyutzler commented 6 years ago

I am fine with a very restrictive policy here, even restricting includes to the same repository. My issue is that I have documents that I want modularized but I also want to be able to present them on the web as complete documents and right now I can't do that. I have to kick off a separate process to generate the file then post that. Whether I do this manually or through some sort of agent is irrelevant, it is a nuisance and completely unnecessary.

kavaliro commented 6 years ago

3-5 levels would be perfect imho. Five gives a few levels for document structure and a couple of levels for templates. Three would suffice for most use cases, but five would handle almost all of them.

While I can handle all of the includes before pushing the content to github, effectively that means I'm no longer able to use github as source control (for the same reason our .gitignore files exclude /bin.)

One use case I have uses include to add an svg, and the text fields of the svg are then changed via substitutions. Which really isn't that complex--an svg is just xml after all--but it's not something you can do with the standard image handler.

Count me among those who are bonkers for includes. Includes really are a must have feature for asciidoc.

denuno commented 6 years ago

Heck, I'd settle for even one level-- and a same-repo policy is what I'd expect off the top of my head so that'd be dandy too (though it'd be swell if they follow the standard GH route of an inter-github policy for linking to stuff in other GH user/org repos).

While I can handle all of the includes before pushing the content to github, effectively that means I'm no longer able to use github as source control (for the same reason our .gitignore files exclude /bin.)

:point_up: this!

One of the main reasons I like GH is that slickly rendered README. It is powerful feature. Without includes it's hobbled in exactly the way you wouldn't want though, as outlined above (making it a non-option for folks who have organized docs in their sources ("Don't look at those, look at the generated ones! (but yes, edit those)" yuck).

GH has had includes of various types (at least header/footer) for years now in the wiki section, so it's a problem that was solved for GH in some general form ages ago...

kavaliro commented 6 years ago

I just want to mention that two workarounds are possible for this, for those needing a solution.

  1. There's a jekyll asciidoc template floating around that is set up to use Travis-CI to do the compilation, committing the results to the gh-pages branch of a repo.

  2. If you need to retain tighter control and compile locally, you can use git hooks to kick off scripts to do that, commit to the gh-pages branch, and push it. (I haven't tested that, but i don't see any reason it wouldn't work at first glance.)

mojavelinux commented 6 years ago

Unfortunately, publishing to GitHub Pages, or Netlify, or various other static web hosts, is a different solution entirely. If you use your own static site generator, you open up a whole world of possibilities. But the README and other files in a repository on GitHub are still crippled.

A better workaround is to have a CI job that monitors for changes to AsciiDoc files in the repository and expands the include directives, obviously leaving behind a hint so that the region can later be updated. It's not ideal, but it at least lets you keep your documents DRY.

denuno commented 6 years ago

Yeah, the closest thing to a work-around would be some type of include that works now, like images, but that's a downer for various reasons.

I can whip up a PR for this project if there's even a whiff of interest from someone in powah, but this is a case where I won't be able to use my own fork, so I'm less than motivated to write it unless there's at least a chance of it getting in... there's been nothing on most these issues, and I'm, let us just say, not optimistic, about support@github.com being able to help.

Maybe @MikeMcQuaid can point us in the right direction?

jyutzler commented 6 years ago

I fully understand that publishing to a static web host is a different matter and I will be sure to take a look at jekyll-asciidoc to see if it is a potential partial solution. It doesn't change the value of rendering AsciiDoc directly through GitHub (as is done with Markdown, GeoJSON, etc.).

MikeMcQuaid commented 6 years ago

I can whip up a PR for this project if there's even a whiff of interest from someone in powah, but this is a case where I won't be able to use my own fork, so I'm less than motivated to write it unless there's at least a chance of it getting in... there's been nothing on most these issues, and I'm, let us just say, not optimistic, about support@github.com being able to help.

Maybe @MikeMcQuaid can point us in the right direction?

@denuno This isn't something I have or do work on at GitHub. Please email support@github.com rather than @mentioning me unless it relates to my open source work. Thanks!

denuno commented 6 years ago

Apologies, I grabbed the first person who looked to have some type of power to do something for the project, maybe I should have @ more prolific committer-- or it seems like maybe I have no idea how contribution works for github projects on github? :thinking:

It would seem odd if, it weren't par for the course (I'm the same I reckon), that github projects don't use github, and instead have some out-of-band deal going on... seriously though? Hit up a generic 'support' addy for feedback on issues? :roll_eyes:

I sent an email. I shouldn't have bitched before trying it, but regardless-- that there is a disconnect which is worse than missing include functionality! Consider this comment a PR to address it by connecting projects to people (or at least positions beyond 'support'). :stuck_out_tongue:

denuno commented 6 years ago

I'd be remiss not to do a PR now. :smiley:

Is anyone else already on it? Is there anything really to do, for that matter? This stuff just calls the external tools anyhow, not sure why any special things needed to be added in the first place?

I thought there was more to it, because of all the talk about security concerns being the main reason includes have not been implemented, but if there are security concerns with this set up, github is exposing that the infrastructure is way more vulnerable than it should be...

Like, does this really get run on some type of master server that has write access to all github repositories or some such? Could I craft an Evil Commit that would Do Something to other repositories? Is servable content + traffic not watched to verify content isn't misbehaving?

There must be a ton of stuff checking for more issues than I can think of off the top of my head... what is the real reason this hasn't been implemented in 5+ years?

If it really is a security concern, please have support@github.com contact me privately, and I'll point out the attack vectors that are exposed by stating such (which are pretty obvious, and have nothing to do with includes, so I'm hoping that idea was just someone's attempt to keep things simple -- too simple, sure -- but a worthy thing to strive for... or maybe it was some misdirection... I could get behind that too :dark_sunglasses:).

:laughing: I meant to just say I'd put my money where my mouth is, versus piling on some more, but I started really wondering mid-type.

denuno commented 6 years ago

Just for some closure on this:

I dropped support@github.com a line, and someone got back to me pretty quick!

Includes are on "The List" (for next year), and there's nothing anyone outside GitHub itself can do or contribute to speed that up. It's a coderpower issue. To be on The List even a year out is like "yay!". :stuck_out_tongue_closed_eyes:

For anyone looking for authoritative feedback:

:fire: Contact support@github.com to get information about issues in the GitHub issue system.

We can then person-power the [shadow] issue systems on github by relaying responses, as I'm doing now.

Ha! Just had the awesome idea to write a GitHub support GitHub issues integration. Once you automate a shadow system, it's technically some type of mirror, right? :thinking:

lukehefson commented 5 years ago

Hey folks šŸ‘‹. I'm a PM at GitHub working with some of our Render folk and I've found this thread via https://twitter.com/matthewmccull/status/1083619220858986497

In truth, we've got some infrastructure work to pay down first before we can take a serious look at this again, but I want to let you know that it does matter to us. We're hoping to make inroads soon into that infra work ā€“ and then we can start investigating how best to address this for y'all!

cc/ @mojavelinux @jexp FYI @matthewmccullough @clarkbw @skalnik

jexp commented 5 years ago

That would be amazing, thanks so much @lukehefson

Btw. I love your "small-ux-wins" (papercuts) project, that has already improved my GH experience a lot.

mojavelinux commented 5 years ago

As the lead of the Asciidoctor project, I can report that I've been asked about this more than any other feature. It would be a game changer. Even if it's not something that will happen right away, I cannot overstate how much this communication means to us at least. Thank you!

cyotee commented 5 years ago

Thank you for putting it on the docket.

t1 commented 5 years ago

The GitLab issue was implemented for release 12.0 ;-)

And for what it's worth: Limiting the nesting depth to, say, 5 may not be good enough, security wise: If I want to DOS-attack it, I'd write a 10 MB file full of includes on itself... that's roundabout 600 includes. With 5 levels, that would be 600^5 = 77 trillion includes (if my math is correct :-) An alternative would be to keep a list of files already included.

mojavelinux commented 5 years ago

The Asciidoctor processor already accounts for this scenario. The depth is not really a depth, but rather a stack size. In your case, it would attempt to include the file 5 times, then stop. The processor does happen to track which files have been included, so the stack is accessible if you need it. But since there are so many clever ways to increase the depth, we decided instead to use the stack size as the measurement to protect against this cleverness.

protolambda commented 5 years ago

@lukehefson This feature would be amazing to have. Now that it has been implemented by GitLab, do you think GitHub can implement it? Is the original time estimate still accurate?

Referencing code in documentation, without copying, is extremely powerful to keep examples up to date and well-tested. Even a single depth include would be a great thing to have for exactly this.

This is especially meaningful for writing specifications for large projects, where embedding well-tested python code can immensely improve the understanding for readers who would otherwise have to click through a ton of links to read relevant reference code.

lukehefson commented 5 years ago

Now that it has been implemented by GitLab, do you think GitHub can implement it? Is the original time estimate still accurate?

Hey @protolambda! We're not able to make a time estimate at the moment for the same reasons I stated in https://github.com/github/markup/issues/1095#issuecomment-453515913. Although I appreciate your bump ā€“ this is still definitely something on our minds that we'd like to work towards!

protolambda commented 5 years ago

@lukehefson Kind reminder that this feature is still very desired. And if there is anything to be done that is blocking you, please share, I'm happy to help :)

brunchboy commented 5 years ago

In the mean time @protolambda I have discovered that I can generate much better Asciidoctor-based documentation sites with Antora and they can even be automatically built and hosted by Netlify for free, so I am no longer waiting for GitHub to offer this feature natively. As an example, for one of my GitHub hosted projects: https://blt-guide.deepsymmetry.org/beat-link-trigger/readme

protolambda commented 4 years ago

@lukehefson Your monthly reminder for this feature here, linking code in a stable readable way as with ascii-doc would be amazing. Any news on planning for this feature? Or alternatives to look into?

lukehefson commented 4 years ago

No timeline yet @protolambda ā€“ but feel free to continue to bump us ā€“ it helps to keep it top of mind!

maxandersen commented 4 years ago

just another +1 as I really could use that I can refer to example files with code rather than having to duplicate the content in the readme.adoc.

brunchboy commented 4 years ago

I used to be impatient for this, but now I am glad that I gave up waiting: I get much better documentation by using antora to organize my content, and build and serve it for free using netlify. Examples include Beat Link Trigger, Afterglow, and Open Beat Control. Once set up, the guides build as soon as I commit to GitHub, just as if I were hosting them on GitHub, but I have much more control over the look and feel and content, and the full power of Antora, which was well worth learning. Antora is the site generator created by the same people who are doing the current work on evolving Asciidoctor and standardizing Asciidoc.

maxandersen commented 4 years ago

@brunchboy antora is great, but none of that helps on the readme.adoc file view in the repo on github.

brunchboy commented 4 years ago

Thatā€™s what Iā€™m saying. Thereā€™s no need for people to try to read asciidoc documentation served by GitHub. Itā€™s never going to come close to what you can achieve with Antora and netlify with just a one time configuration. And if your situation is complicated enough that you want include files, itā€™s probably time to invest in learning a site generator like this. I resisted for a while and in retrospect that was silly.

McPringle commented 4 years ago

@brunchboy: Please don't be so arrogant as to speak for all GitHub users. You probably don't know the requirements of all other GitHub users to generally question the usefulness of AsciiDoctor and its include directive.

I'm glad that you found a solution to your needs. Antora certainly has its raison d'ĆŖtre, but it's out of the question for my use case. I reserve the right to judge that for myself, since I know my own needs and those of my projects very well. I also use static website generators, for more than a quarter-century. Believe me, not every problem can be solved with that.

When you hold a hammer in your hand, every problem often looks like a nail. Until you put the hammer aside and take a closer look.

Merry Christmas and a Happy New Year

brunchboy commented 4 years ago

Iā€™m not going to respond in the same tone because I havenā€™t been trying to speak for any other users, nor other things I was accused of. For example, how could I be questioning the usefulness of Asciidoctor when that is what antora is built on? I am merely reporting that for a long time I was frustrated that this issue hadnā€™t been resolvedā€”because I thought the only way people could read good Asciidoc-formatted manuals for my GitHub hosted projects was if GitHub would fix issues with their own Asciidoc renderingā€”and I am now much happier because I have found another way to achieve my goals. I still edit Asciidoc formatted files that are contained in my project, and a single push to GitHub results in a beautiful documentation site for me, including things like a version picker based on branches and tags, and I have more control over the stylesheet and surrounding content.

I am not trying to say this is the only answer for everyone, nor demean anyoneā€™s approaches to documentation. However, if this approach might work for others, I wanted to share it with them. The antora community is a subset of the asciidoctor community; both are amazing, friendly, helpful, and welcoming. I regret the time I wasted thinking antora would be too complicated or overkill or too much work for my use case. Others will have different experiences. And that is fine.

brunchboy commented 4 years ago

Also note that the same people who have been working on antora have been contributing patches to Asciidoctor to help GitHub close this issue when they can get around to it, so hopefully someday soon people will have both options, which will be better for everyone. IIRC, GitLab has already been able to enable Asciidoctor includes because of these changes, but for the full details Iā€™d have to scroll back pretty far in the Asciidoctor Gitter chat.

protolambda commented 4 years ago

Honestly, although Asciidoctor is awesome, this is also about just having any kind of well-integrated code snippets in any kind of GitHub markup, based on references instead of ever changing line numbers. Github issues have something close, where linked snippets do render, but not suited for regular documents: it points to a git reference and line numbers. And is more about referencing old code for bug reports etc. than embedding something to document a real-world, tested, code snippet.

Something more inline (no big repeated snippet header), flexible (asciidoctor code reference style without line numbers works well), and integrated (no static site gen or CI tasks) would be an instant core feature attracting users to GitHub. It is just mind boggling how the site for sharing code so far has not prioritized sharing (not copying) code in markup.

Anyway, even some minimal code embedding feature that does the above well would be amazing. My GitHub wish for 2020. And even better if Asciidoctor includes fully work of course, but taking smaller steps to get there may help.

rmurillo21 commented 4 years ago

@brunchboy Can you provide a bit of detail on how you integrate the antora/netlify tool chain to get ascii docs built and deployed?

brunchboy commented 4 years ago

@rmurillo21 I would be happy to explain that, but as I donā€™t want to derail this issue, the best place to discuss it would be on the Antora users Gitter chat, where more experienced users than I can give alternate options as well. That is at https://gitter.im/antora/users

In a nutshell, using the free Netlify offering I create a site for each project I want to host an Antora site for, then in the Build settings I point Netlify at my GitHub repo, and enter a Build command like:

npm i @antora/cli @antora/site-generator-default && $(npm bin)/antora --fetch doc/netlify.yml

That pulls down the Antora components, and points them at the playbook in my repository which tells Antora what and how to build. I set a Publish directory of doc/build/site in the Netlify build settings.

I think that was all that was needed to build a site hosted at a Netlify URL, which gets automatically rebuilt whenever I push to the configured branches at GitHub. But since I have my own domain names, I also configured those to point at the Netlify builds, and Netlify uses Letā€™s Encrypt to obtain and manage SSL certificates for me so I can serve my pages over HTTPS. That is set up in the Domain Management section of the Netlify site, and their documentation is helpful. For more details, please ask on Gitter! šŸ˜„

ArneLimburg commented 4 years ago

@lukehefson Any update on this?

lukehefson commented 4 years ago

lukehefson Any update on this?

Hey @ArneLimburg! Still no updates I am afraid! I no longer work with the team(s) that might work on something like this and as far as I know it's not an immediate roadmap item.

zteater commented 4 years ago

Well, this is still open 2.5 years later. For anyone using Maven, I wrote a plugin to read the include:: syntax on a template file and generate a consolidated file. Not a fullly-baked asciidoctor tool, but enough to generate my readme.adoc file.

https://github.com/whelk-io/asciidoc-template-maven-plugin

chevdor commented 4 years ago

I gave up on this one and switched to Gitlab where the support for asciidoc is great and realized that Gitlab greatly improved overall. I am not looking back...

psuzzi commented 4 years ago

+1 please, fix this issue. Without support for asciidoc-include it is very hard to write proper asciidoc documentation for a project.

splix commented 4 years ago

Having support for includes is a critical feature of Asciidoc. Include directive allows making documentation and code consistent, and saves time because you don't need to copy-paste thousand times.

As GitHub is supposed to be a platform for software developers, it would be right to have the full support of documentation format designed for software development.