cabo / kramdown-rfc

An XML2RFC (RFC799x) backend for Thomas Leitner's kramdown markdown parser
MIT License
195 stars 83 forks source link

v3 default attributes #162

Open mnot opened 2 years ago

mnot commented 2 years ago

The RPC seems to be adding the following attributes:

Could these be added in v3 output (once again, to reduce diffs)?

mnot commented 2 years ago

Ideally after the anchor in each

cabo commented 2 years ago

I'm not sure I understand the process here. kramdown-rfc produces a v2/v3 mongrel RFCXML. Either the author or the RFC editor runs this through xml2rfc --v2v3 to obtain the processable ("unprepped") RFCXML for editing. The bug that creates these default-valued attributes should be present on both sides, so I'm not sure kramdown-rfc needs to emulate it. Do you run your diff before --v2v3?

mnot commented 2 years ago

It’s necessary to run —2to3 even when kramdown-rfc2629 ia run with —v3?

cabo commented 2 years ago

With --v3 (which is now default), kramdown-rfc generates mongrel XML that is converted to proper v3 by xml2rfc. Martin Thomson's pipeline does that for you, so you may not have noticed.

mnot commented 2 years ago

Ah, I didn't see that -- yes, it is.

Nevertheless, I'm not seeing these attributes in the XML produced by kramdown-rfc2629 --v3 | xml2rfc --v2v3, whereas they are in the RPC's copy.

I've updated to latest xml2rfc, i-d-template, and kramdown-rfc2629.

cabo commented 2 years ago

Hmm, I see them. We must be doing something different. Do you have a sample I could try?

mnot commented 2 years ago

in httpwg/http-extensions, make draft-ietf-httpbis-bcp56bis.xml, and compare to https://www.rfc-editor.org/authors/rfc9209.xml

mnot commented 2 years ago

(happy to supply a file for the former via e-mail if you don't want to go to the trouble)

cabo commented 2 years ago

No problem. Indeed, the redundant attributes are gone. Need to examine more closely.

cabo commented 2 years ago

Seems to have happened between xml2rfc 3.11.1 and xml2rfc 3.12.1. The commit messages aren't very useful, looking at the code now.

mnot commented 2 years ago

Hmm. I tried downgrading to 3.9.1 -- the same version used by the RPC for my draft -- and it still happened.

cabo commented 2 years ago

"it happened"?

cabo commented 2 years ago

The fix might actually be in lxml or some such.

mnot commented 2 years ago

the XML produced by kramdown + v2v3 didn't have those default attributes.

cabo commented 2 years ago

The v2v3 output sure did for 3.11.1; I can't easily make a controlled experiment here.

mnot commented 2 years ago

3.11.1 does not produce those attributes for me, when called from @martinthomson's template.

cabo commented 2 years ago

The work happens In

        text = lxml.etree.tostring(self.root.getroottree(), 
                                   encoding='unicode',
                                   doctype=doctype_string,
                                   pretty_print=True)

So this might really be an lxml fix.

mnot commented 2 years ago

I know very little about the xml2rfc codebase, but surely it can control what attributes appear on an element?

cabo commented 2 years ago

This is really weird. I have instances up to 3.11.1 that have both numbered=true and toc=default. I also have 3.11.1 instances that are toc=default only. I cannot find instances beyond 3.11.1 that have redundant attributes on <section.

I'd guess this will have solved itself as soon as the RFC editor updates their tools.

mnot commented 2 years ago

OK. I might write a little XSLT for diff purposes in the meantime... thanks.

cabo commented 2 years ago

I think you could also ask the RFC-editor to delete the spurious redundant attributes.

mnot commented 2 years ago

I did; they declined.

mnot commented 2 years ago

For reference: https://gist.github.com/mnot/a2c4e370ed90cc75eb2cf5ad2a19f8cb

cabo commented 2 years ago

I did; they declined.

Whoa. Is this documented somewhere (outside an AUTH48 exchange)? Did they give a reason?

mnot commented 2 years ago

Don't think so. To be fair, I asked but said not to do it if it's a lot of trouble. I could be more insistent.

cabo commented 2 years ago

Maybe they are doing this for some internal tool.

-- mode: grep; default-directory: "~/std/rfc/authors/" -- Grep started at Sun Feb 27 01:28:40

The below was a bit shocking for me... 3.5.0 was 2020-11-18, 3.1.1 was 2020-09-14. But maybe these all are submitter-converted, with whatever these had at hand...

grep --color=auto -nH --null -e "xml2rfc v2v3 conversion 3" rfc9???.xml
rfc9003.xml9:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9004.xml12:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9006.xml8:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9007.xml7:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9008.xml5:<!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9012.xml7:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9015.xml12:  <!-- xml2rfc v2v3 conversion 3.1.1 -->
rfc9016.xml7:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9017.xml9:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9020.xml5:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9022.xml8:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9027.xml7:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9029.xml6:  <!-- xml2rfc v2v3 conversion 3.6.0 -->
rfc9037.xml10:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9039.xml6:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9046.xml6:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9048.xml7:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9050.xml5:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9051.xml6:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9057.xml5:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9059.xml7:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9061.xml5:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9065.xml6:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9067.xml12:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9069.xml12:  <!-- xml2rfc v2v3 conversion 3.3.0 -->
rfc9072.xml9:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9075.xml10:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9078.xml5:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9080.xml21:  <!-- xml2rfc v2v3 conversion 3.2.1 -->
rfc9081.xml7:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9090.xml8:<!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9093.xml6:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9094.xml8:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9095.xml7:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9097.xml10:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9098.xml7:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9099.xml6:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9104.xml7:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9106.xml7:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9107.xml6:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9109.xml7:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9113.xml15:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9114.xml15:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9115.xml4:  <!-- xml2rfc v2v3 conversion 3.4.0 -->
rfc9118.xml12:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9119.xml13:   <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9120.xml12:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9124.xml16:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9132.xml6:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9135.xml12:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9137.xml12:<!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9138.xml12:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9139.xml11:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9141.xml12:  <!-- xml2rfc v2v3 conversion 3.10.0 -->
rfc9146.xml14:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9147.xml16:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9150.xml12:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9152.xml7:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9155.xml14:  <!-- xml2rfc v2v3 conversion 3.10.0 -->
rfc9156.xml11:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9159.xml11:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9160.xml12:  <!-- xml2rfc v2v3 conversion 3.10.0 -->
rfc9162.xml12:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9171.xml15:  <!-- xml2rfc v2v3 conversion 3.7.0 -->
rfc9173.xml12:  <!-- xml2rfc v2v3 conversion 3.9.0 -->
rfc9177.xml12:  <!-- xml2rfc v2v3 conversion 3.8.0 -->
rfc9178.xml13:  <!-- xml2rfc v2v3 conversion 3.1.1 -->
rfc9181.xml12:  <!-- xml2rfc v2v3 conversion 3.10.0 -->
rfc9182.xml12:  <!-- xml2rfc v2v3 conversion 3.10.0 -->
rfc9187.xml10:  <!-- xml2rfc v2v3 conversion 3.12.0 -->
rfc9189.xml14:  <!-- xml2rfc v2v3 conversion 3.11.1 -->
rfc9192.xml12:  <!-- xml2rfc v2v3 conversion 3.12.0 -->
rfc9193.xml14:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9199.xml14:  <!-- xml2rfc v2v3 conversion 3.12.0 -->
rfc9204.xml14:  <!-- xml2rfc v2v3 conversion 3.5.0 -->
rfc9205.xml16:  <!-- xml2rfc v2v3 conversion 3.9.1 -->
rfc9208.xml12:  <!-- xml2rfc v2v3 conversion 3.11.1 -->
rfc9209.xml15:  <!-- xml2rfc v2v3 conversion 3.10.0 -->

Grep finished with 78 matches found at Sun Feb 27 01:28:41
mnot commented 2 years ago

See also https://github.com/ietf-tools/xml2rfc/issues/632

cabo commented 2 years ago

See also https://notes.ietf.org/tools-team-20220308#XML2RFC---Kesara I don't expect quick progress -- there are some 135 tickets, and three are scheduled now... So let's see what we can do on the authoring side.

kesara commented 2 years ago

@mnot You should be able to emulate previous behaviour to reduce diffs by using kramdown-rfc2629 -2 option. This adds SYSTEM "rfc2629.dtd" doctype reference. xml2rfc adds those default values when that doctype reference is present.

Note that this will not work with kramdown-rfc command because of #164.

cabo commented 2 years ago

@mnot You should be able to emulate previous behaviour to reduce diffs by using kramdown-rfc2629 -2 option.

Thanks for finding the reason...

Well, the workaround to go to -2 also turns off v3 processing. But a simple sed script could add back the SYSTEM "rfc2629.dtd" (before feeding into --v2v3).