cabo / kramdown-rfc

An XML2RFC (RFC799x) backend for Thomas Leitner's kramdown markdown parser
MIT License
195 stars 83 forks source link

Persistent connections for reference fetching #101

Open martinthomson opened 3 years ago

martinthomson commented 3 years ago

I just had a build fail due to a DNS error after successfully fetching resources from the very same server.

/github/home/.cache/xml2rfc/reference.RFC.7838.xml: fetching from https://www.rfc-editor.org/refs/bibxml/reference.RFC.7838.xml
/github/home/.cache/xml2rfc/reference.RFC.6265.xml: fetching from https://www.rfc-editor.org/refs/bibxml/reference.RFC.6265.xml
*** Failed to open TCP connection to www.rfc-editor.org:443 (getaddrinfo: Try again) while fetching https://www.rfc-editor.org/refs/bibxml/reference.RFC.6265.xml

This would not happen if the HTTP client were able to reuse connections. I realize that this is work, but it would make the tool considerably faster.

cabo commented 3 years ago

There are two somewhat conflicting potential directions of improvement:

Parallel connections would help with slow services such as the DOI service. Persistent connections more with the faster services such as rfc-editor.

This is all complicated by the desire to support older Ruby versions, as well as issues many Ruby installations have with their ca_certs.

So, yes, this is a bit of work (requiring some exploration in the cross product of servers, systems, and versions). Probably won't complete before IETF110 deadline...

cabo commented 3 years ago

I hacked something together as aa0ef89, with only rough testing so far. Anecdotically reduces fetch time in one of my drafts from 6.5 to 4 s, so it's not a breakthrough but worth a little experimentation. Since I don't want to break older platforms, some explicit opt-in actions are required to use this, see copy of commit comments below.

1.3.32: Experimental persistent connections

to use this:

KRAMDOWN_PERSISTENT= kdrfc -3chi foo.md

martinthomson commented 3 years ago

Thanks Carsten. I'm giving this a go right now. The DNS errors on rfc-editor.org are at the point of more or less forcing my hand.

martinthomson commented 3 years ago

I can confirm that this works. And it seems to work very well; it looks like it more than halved the build time in CI for a pair of drafts I'm working on. You can probably close this; unless you wanted to use this to track parallel fetching.

martinthomson commented 3 years ago

Did you want to keep this open to track parallel downloads? I'm happy enough with the performance bump from persistent connections (and make -j$(nproc) when I'm in a hurry, of course).

cabo commented 3 years ago

I think the remaining step is to make this the default.

cabo commented 3 years ago

A little bit of sleuthing around in different platform versions needed.

MikeBishop commented 3 years ago

I'm getting the following error, which I'm guessing in related:

Can't set up persistent HTTP -- cannot load such file -- net/http/persistent

Gem installed this version, but didn't appear to pull in any new dependencies; is there one it should have declared for this?

cabo commented 3 years ago

I don't know what "this version" is above. The message is a warning, kramdown-rfc will continue, just slower. To even get this message you must have set KRAMDOWN_PERSISTENT in your environment (without that, kramdown-rfc always uses the slower method). You may want to try gem install net-http-persistent to get rid of the warning message and use the faster method.

MikeBishop commented 3 years ago

Ah -- I see it in the commit notes. Perhaps that should be a declared dependency if it's now the default.

cabo commented 3 years ago

It is not the default -- you have to set the environment variable. It cannot be the default as long as we still support the Ruby 2.3 version installed on macOS High Sierra. (Maybe we should make the cut with Monterey... let's see.)

MikeBishop commented 3 years ago

Ah -- the variable is set by https://github.com/martinthomson/i-d-template. @martinthomson, that seems sub-optimal if it requires a dependency that isn't installed by default.

cabo commented 3 years ago

At least, I have to fix the warning message.

martinthomson commented 3 years ago

I considered making the setting of the variable conditional, but then the problem wouldn't be discoverable. And the performance gain is significant. A different warning message would be good. Something like "Not using persistent HTTP: gem install net-http-persistent to silence this message" would be nice.