WardCunningham / Smallest-Federated-Wiki

This wiki innovates by: 1. federated sharing, 2. drag refactoring and 3. data visualization.
http://wardcunningham.github.com/
GNU General Public License v2.0
1.21k stars 178 forks source link

Wiki Links with friendly labels #232

Open harlantwood opened 12 years ago

harlantwood commented 12 years ago

We currently support a subset of MediaWiki-style links. I am implementing support for the MediaWiki links with a "friendly label", eg:

[[My Page|Click Here]]

would become:

<a class="internal" href="/my-page.html" data-page-name="my-page">Click Here</a>

I am working on this in the ruby server, and want to get a sense of whether there is support for this direction in general, ie in the client and node server.

harlantwood commented 12 years ago

Done on the ruby server side: 141aa1d

WardCunningham commented 12 years ago

I'm not a fan of calling a wiki page anything but its proper name.

harlantwood commented 12 years ago

Interesting. I'll tell you my use case -- maybe there is a solution that would suit both our desires.

As I demoed on the call last week, I am developing an external app which scrapes an arbitrary (CC) website and converts it to a SFW instance. I am currently working on converting HTML links to wiki links. Consider two links to the same page:

Please check out our <a href="API">API</a>

See also our lovely <a href="API">Geek Stuff</a>

The solution I had in mind is to convert these HTML links to wiki links thus:

[[API]]

[[API|Geek Stuff]]
WardCunningham commented 12 years ago

You're going to find a lot of things in html that won't have convenient equivalents in our standard paragraphs. I wonder if some thinking might get you into algorithms that actually improve on the pages you scrape. For example, you might discover the general shape of a complex site, find the leveraged paragraphs, and then use these to make what would surely become a familiar visualization that gets readers to the good parts quickly.

Google does this when they offer a few quick links into a website. I wonder how they do it? I know you can offer hints but they do ok without them.

Alternative link text has been discussed in Issue #140 wherein I cite my short essay on the subject.

harlantwood commented 12 years ago

Thanks for referring me to the previous discussion. Next time I'll search the history before starting a duplicate issue ; )

To get right down to the meat of it:

You're going to find a lot of things in html that won't have convenient equivalents in our standard paragraphs.

True! What I'm hoping to achieve is a good representation of the content -- not necessarily the parts of the HTML that are about presentation. I am currently reducing HTML to literally plain text -- I do want to restore links, and inline images, and probably a few other things.

I want to keep links with the same "text" and the same "target" as the original content, and can't see how to do this without a feature like the one described above. Oh, well, there is one option. I could just have HTML links like:

Check out <a href="/view/about-us">our team</a>

instead of:

[[about us|our team]]

They wouldn't be wikilinks, so they wouldn't open up a new panel to the right; they would replace the whole page with the new page, no matter how many panels were open. But on the up side, the links would work with all SFW servers, not just my fork ; )

It's possible that SFW is not the right target technology for what I'm trying to do. But the drag and drop remixing is just so compelling... Imagine the SFW remixing capabilities, combined with a huge body of CC licensed content, in a topic area of genuine interest to you.

Obviously you have a very strong position on this one. I will keep experimenting and communicating, and trust that we can find solutions that meet all our objectives.

hallahan commented 12 years ago

This does make sense as being potentially useful, but you will notice that this feature is rarely used in Wikipedia. I have played with your converter, and I do think you should keep working on it. It will be incredibly useful for getting content off of all of those nasty sites with ads everywhere and just a nugget of content.

I would like to suggest two things:

  1. HTML will never ever go away, so don't fight it. It is bloated, but so is the English language. If you don't feel like storing link data in HTML makes sense, have you considered JSON?
  2. If it can be done client-side, it should be done client-side. Sure, if your converter is doing the work, Ruby makes sense, but it does not make sense if the user is making a decision about the link.
WardCunningham commented 12 years ago

Let's think for a bit about what will happen once Harlan is done. What happens then? I'm thinking that he might be manufacturing the raw materiel for some amazing mashups that we can't even imagine yet. What are the best decisions we can make today to support those people working in that future?

(I was thinking of the html tag solution but, yes, that defeats the side-scrolling which does let you look at a lot of content without getting too confused.)

WardCunningham commented 12 years ago

Hypothesis: People today use links in a careless way and cover up their bad decisions with alternate link text.

If this is so, is there something that Harlan can do today to free people in the future from these bad decisions? Remember, his converter program can look at multiple html pages and infer higher purpose. (Admittedly that inferring will have to be of a mechanical nature that can be programmed today and run in mass.)

Also, Harlan can emit multiple paragraphs when he finds one that is hard to convert. One paragraph would be meant to be reused. The second paragraph would be just for navigation and would look fit for that purpose only.

harlantwood commented 12 years ago

Hm, I'm a bit suspicious of trying to improve other people's content in an automated way. Authors who have carefully crafted link text that differs from the linked-to-page-name will surely disagree that any changes to the link text are indeed improvements. My thinking has been to assume that the original content is sound, and try to reproduce it as faithfully as possible, while converting it to an easilty remixable (and ultimately fork/diff/merge-able) format like SFW.

harlantwood commented 12 years ago

you will notice that this feature is rarely used in Wikipedia.

Interesting point. I am starting to come around to the idea that using the page name as the link text is a "best practice", although I still don't want to force content into that form, out of respect for the original authors.

I have played with your converter, and I do think you should keep working on it.

Great! Always a pleasure to have someone use your software ; )

Last night I was playing with generating links on the sever side to replicate the client-side links -- in order to get the side scrolling working. I made some progress (see lines 77-80 of the forker), but still no side scrolling.

If it can be done client-side, it should be done client-side.

I think that's the key. One possible strategy -- on the server side, when scraping:

<a href="/content/recipes.html">Chapter 24, Recipes</a>

convert the tag to:

<a slug="content-recipes">Chapter 24, Recipes</a>    # 'content-recipes' is the path converted to a slug

or

<a slug="recipes">Chapter 24, Recipes</a>        # 'recipes' is the page title, converted to a slug -- this is harder but better

then on the client side, when we see <a> tags with a slug but no href, we convert them to internal links in the usual pattern:

<a class="internal" href="/recipes.html" data-page-name="recipes" title="origin">Chapter 24, Recipes</a>
harlantwood commented 12 years ago

The more I think about the server/client solution above the less I like it. It feels ugly to pass around munged broken <a> tags.

Attempt # 2 -- on the server side, when scraping:

<a href="/content/recipes.html">Chapter 24, Recipes</a>

simply add a class to the tag, to mark it for later client-side action:

<a href="/content/recipes.html" class="fedwiki-internal">Chapter 24, Recipes</a>

then on the client side, convert to internal links in the usual pattern:

<a class="internal" href="/recipes.html" data-page-name="recipes" title="origin">Chapter 24, Recipes</a>
WardCunningham commented 12 years ago

It's interesting that the current client is so easily tricked into doing your bidding by authoring "knowing" html. This is probably more a flaw than a feature.

I agree that the client alone should manage most of the look and feel responsibilities with the client side left to storing and delivering useful information plainly encoded.

My resistance to enhancing the expressive ability of markup comes from a desire for a radically simplified model of sharing based on the smallest number of concepts, hopefully fresh new concepts at that.

harlantwood commented 12 years ago

When I've been crawling wikipedia pages, I end up with lots of "internal wiki links" that don't work -- because I have not crawled all the pages that are referenced. So I am now thinking to keep the original link intact, and also pass a hint to the client of the probable slug name, in case such an internal page exists.

So, attempt # 3 -- on the server side, when scraping:

<a href="http://my-cookbook.com/content/recipes.html">Chapter 24, Recipes</a>

simply add a hint to the tag, to mark it for possible client-side rewriting:

<a href="http://my-cookbook.com/content/recipes.html" fedwiki-slug-hint="recipes">Chapter 24, Recipes</a>

then on the client side, if we know of such a page, convert to internal links in the usual pattern:

<a class="internal" data-page-name="recipes" title="origin">Chapter 24, Recipes</a>

and otherwise leave the link unchanged, so it still points to the original resource.

WardCunningham commented 12 years ago

Interesting approach. The client could go for the federated wiki page and failing that go to the source.

Perhaps your server could convert the page on demand or schedule conversions based on links that it has already served.

harlantwood commented 12 years ago

Nice, I like the idea of offline crawling of the pages we link to but have not yet crawled.