Detect bare links and substitute some of them based on common patterns

GaurangTandon commented 6 years ago

Bare links are those links which haven't been markdowned. For example: https://developer.mozilla.org/en-US/docs/Web/API/Element is a bare link. Element - MDN is not a bare link.

I think that this userscript can (should?) attempt to replace common bare links into their markdowned patterns. For example:

https://en.wikipedia.org/wiki/[name] => [name] article on Wikipedia
https://developer.mozilla.org/en-US/docs/X/Y/Z => [Z] article on MDN

and so on for several other encyclopedic type references. Although these are the most common, I think we can expand this list pretty quickly.

I noticed that you already have a function (App.pipeMods.inlineImages) that attempts to markdown-ify bare image links. So I think this should be within the scope of Magic Editor too.

Thoughts? I'd be happy to do a PR as necessary.

makyen commented 6 years ago

Are you proposing only doing this for the relatively few sites for which we can parse the format of the URL (which may change in the future), or are you proposing fetching the page and parsing the content (e.g. to get the window title)?

Note that parsing can be complex, and it's not clear how well we will be able to do so for a large percentage of sites, but this could be useful.

I'm not sure where we want to draw the line as to what Magic Editor will do. Currently, it attempts to automate things that editors routinely do as common edits. I'm not sure where this falls in that spectrum. I added the inlining of images because that's something that most editors do all the time (i.e. already a very common edit).

Just interested: Are these types of links ones which you see mostly on SO, or on other sites?

If you do do this, it would be beneficial to have each parsing driven via an Object describing the parse, similar to how edit rules are defined. Off the top of my head, you'd need properties like domainMatch (a RegExp to match the domain to indicate the rule is to be applied), and parseFunction, which is a function that takes the URL and returns a title for the link.

GaurangTandon commented 6 years ago

Are you proposing only doing this for the relatively few sites for which we can parse the format of the URL (which may change in the future), or are you proposing fetching the page and parsing the content (e.g. to get the window title)?

I was proposing hardcoding this replacement into the code for few of the most common sites. For example, we can hardcode this rule for Wikipedia, MDN, w3schools, etc. (and ChemSpider/PubChem/etc. for Chemistry.SE, i.e. each SE sites users can expand the list themselves) like so:

text.replace(/regex/, "$1 article on Wikipedia") 
// regex - would detect a bare Wikipedia link, with one capturing group set on the article name

Are these types of links ones which you see mostly on SO, or on other sites?

I am mostly active on Chem.SE, so I speak from that perspective. For example, such posts are not too hard to find. I find myself routinely converting such links to their markdowned alternatives, and have found that there is little variety in the links people use. So, hardcoding them in would not only be easier than fetching the page and parsing its title, but also cover the majority of the links people use.

Off the top of my head, you'd need properties like domainMatch (a RegExp to match the domain to indicate the rule is to be applied), and parseFunction, which is a function that takes the URL and returns a title for the link.

Yep, I think that's doable.

Thoughts?

makyen commented 6 years ago

Overall, it looks like this could be useful. Feel free to submit a PR.

To make my life easier in resolving any merge conflicts, I'd appreciate it if you would start from the "ME-Mak-next-version" branch (i.e. make another branch off of it) and submit your PR into the ME-Mak-next-version branch (i.e. with that branch as the target of the PR). Thanks.

Thanks for the interest in the project and doing the work.

GaurangTandon commented 6 years ago

Alright. So, I have first made a JS repl to exactly demonstrate the core functionality I was talking about. Please have a look at the testSuite, around line 150, to see the input URLs with their corresponding expected output article names. You may also use console.log(getArticleName(YOUR_URL_HERE)); to test custom Wikipedia/MDN URLs.

I was hoping if you could first have a look at it and look for the missing good coding practices from my side (I did use the practices from your previous response).

The remaining work is to only a) detect bare links in a given markdown text b) replace all such bare_links with [name](bare_link).

SO-Close-Vote-Reviewers / UserScripts

Detect bare links and substitute some of them based on common patterns #134