Knagis / CommonMark.NET

Implementation of CommonMark specification in C# for converting Markdown documents to HTML. Optimized for maximum performance and portability.
BSD 3-Clause "New" or "Revised" License
1k stars 146 forks source link

Terminology changes #70

Open dmitry-shechtman opened 8 years ago

dmitry-shechtman commented 8 years ago

So the Professor is back on CommonMark, which is great news.

Somewhat less great is his renaming of core terms in both the spec and the C implementation (I understand those modifications are by popular demand -- I guess that's the price one has to pay when seeking community approval for their specification).

The following have changed so far:

The following are expected to change:

Should CommonMark.NET reflect those changes? I don't mind renaming everything, but wouldn't changing e.g. HeaderLevel to HeadingLevel break compatibility?

P.S. I noticed cmark has a separate struct for header/ing data. Maybe this is an opportunity to do the same here (while retaining an obsolete HeaderLevel property for the time being).

Knagis commented 8 years ago

I think that a second property that uses the same backing field plus obsoleting the previous property.

Heading sounds fine but the thematic break... I agree with your statement there :)

As for the struct for header - on cmark it contains heading level and heading source type (atx or setext) which is currently in the BlockTag enumeration for us. I don't know if that is worth it (a struct should not impact GC - at least I think so) - in that it seemingly does not add any value.

dmitry-shechtman commented 8 years ago

So BlockTag will have something like

[Obsolete]
AtxHeader,

AtxHeading = AtxHeader,

etc. Not very pretty IMHO, but will do.

Re: struct I agree that it adds no value, but that would be consistent with ListData and FencedCodeData (the extensions branch has quite a few of those, including DocumentData, where I moved ReferenceMap). This could actually reduce the memory footprint if stuff like Heading.Level is declared as byte.

dmitry-shechtman commented 8 years ago

I couldn't resist jumping on the opportunity with two renaming suggestions of my own:

  1. Weak emphasis
  2. Unordered list

I updated the top post with these and turned it into a checklist.

Knagis commented 8 years ago

In case of the heading the struct would only have a single field - which is why it does not have its own struct yet.

As for the weak emphasis and unordered list - both sounds fine with me but lets hear what people on the forum says.

dmitry-shechtman commented 8 years ago

I guess the struct is a matter of personal taste.

I renamed both headers and rulers in the extensions branch (leaving HeaderLevel untouched for now). I figured changes like these should be made there rather than in master.

BTW having read through the topic, thematic break started to make more sense to me. I don't care that much for the specific nomenclature, but it is reasonable to call it something other than horizontal rule. As for an emphasis alternative, get ready for emphatic stress as a viable option :scream:

dmitry-shechtman commented 8 years ago

Unlike with weak emphasis, which might be changed as proposed, changed to something unthinkable, or remain unchanged, I feel that unordered lists is the only correct term for what the spec calls bullet lists today.

With that in mind, I went ahead and did a backwards-compatible rename. The state of lists in the extensions branch is now as follows:

  1. ListType and ListData.ListType are obsolete (ListType.Bullet stands unchanged).
  2. BlockTag.List is obsoleted by BlockTag.UnorderedList and BlockTag.OrderedList.
  3. UnorderedListData and Block.UnorderedListData are added.
  4. ListData.BulletChar is obsoleted by UnorderedListData.BulletCharacter.
  5. OrderedListData and Block.OrderedListData are added.
  6. ListData.Start is obsoleted by OrderedListData.Start.
  7. ListData.Delimiter and ListDelimiter are obsoleted by OrderedListData.DelimiterCharacter.
  8. UnorderedListData and OrderedListData define a few additional properties in support of FancyLists.
  9. A LegacyLists extension provides a full backwards compatibility mode, including Unicode bullets (which are soon to become a FancyLists feature).

Update: API Changes contains an updated version.

dmitry-shechtman commented 8 years ago

I started documenting the API changes here.

As you may see, I ultimately moved HeaderLevel into Heading.Level. I also moved DelimiterCharacter into Emphasis.DelimiterCharacter (but didn't document it since that's not in 0.10.0).

I believe that similarly turning Linkable into a struct and removing the TargetUrl and LiteralContent backing fields wouldn't impact performance.

Knagis commented 8 years ago

Have you thought about any mechanisms how extensions like FancyLists would be supported from external assemblies that cannot modify Block and Inline classes?

dmitry-shechtman commented 8 years ago

Extensively :smile:

Lists are actually extensibility taken to the extreme. Adding a new list style is as easy as:

    public class FullWidthLowerAlphaLists : CommonMarkExtension
    {
        protected override IEnumerable<IBlockDelimiterHandler> InitializeBlockDelimiterHandlers(CommonMarkSettings settings)
        {
            var parameters = new OrderedListItemParameters('a', 'z', listStyle: "fullwidth-lower-alpha");
            var delimiter = new ListItemDelimiterParameters('.');
            yield return new AlphaListItemHandler(settings, parameters, delimiter);
        }
    }

Numerous extensibility points are provided (a couple more added just today), so developing a custom extension should be a no-brainer.

There is still the problem of autodiscovery, i.e. how to make external extensions available in the command line. I'm considering MEF, but adding an external dependency might be an issue (especially with .NET < 4.0). It looks like that warrants separate packages (e.g. CommonMark.NET.Extensibility and CommonMark.NET.Console).

P.S. MEF2 targets Profile 259.

dmitry-shechtman commented 8 years ago

CommonMark 0.23 has been released.

May I suggest that the same changes be applied before 0.11 (if you're planning on releasing one), i.e.

  1. HeaderLevel obsoleted by HeadingData struct with Level property
  2. DelimiterCharacter replaced by EmphasisData struct with DelimiterCharacter property
  3. Linkable???
Knagis commented 8 years ago

Yes, I don't see a reason for not doing this. If you merge the changes needed for 0.11 in master I could push the release to nuget.

dmitry-shechtman commented 8 years ago

I'm currently working on disallowing space between link text and link label.

Test Name:  Example541
Test Outcome:   Failed
Result Message: Assert.AreEqual failed. Expected:<<p><img src="/url" alt="foo" title="title" />
[]</p>>. Actual:<<p><img src="/url" alt="foo" title="title[]" />
[]</p>>.
Result StandardOutput:  
Example 541
Section: 541

CommonMark:

    ![foo]␣
    []

    [foo]:␣/url␣"title"

Expected:

    <p><img␣src="/url"␣alt="foo"␣title="title"␣/>
    []</p>

Actual:

    <p><img␣src="/url"␣alt="foo"␣title="title[]"␣/>
    []</p>

Any ideas?

Knagis commented 8 years ago

https://github.com/Knagis/CommonMark.NET/blob/master/CommonMark/Parser/InlineMethods.cs#L935

I suspect that it will be enough to remove the check for space and newline.

dmitry-shechtman commented 8 years ago

Maybe my branch diverged too much from knagis/master, but that doesn't seem to solve the issue with this specific example.

Knagis commented 8 years ago

I will take a look at the new tests in a few hours.

dmitry-shechtman commented 8 years ago

No need, solved in https://github.com/AMDL/CommonMark.NET/commit/da3c7bbdef9b58b863bbeb24799af928fd073ec5.

dmitry-shechtman commented 8 years ago

71 ready for review.