[For Discussion] Refactoring pages

JoeBergin commented 12 years ago

In writing an essay in SFW just now, I found myself wanting some things. I recognize that these may be hard or even undesirable given federation, but nevertheless:

(a) Ability to change the title of a page easily. Authors mis-spell things and clean-up is hard if the error isn't caught immediately. This could cause problems with federation, I think.

(b) Ability to merge pages into one. Yes, it is possible with possibly a lot of dragging. If it were possible to select a set of paragraphs for dragging, rather than just one, it would probably be enough -- maybe even better.

(c) Ability to easily split pages into two. Same issue. Currently requires a lot of dragging. But the select-multiple-paragraphs eases this also.

(d) Ability to easily split a paragraph into two adjacent paragraphs. Merging adjacent paragraphs is easy enough. But now, to split you need to drag the new paragraph in to place as well as adding the text.

WardCunningham commented 12 years ago

On Apr 4, 2012, at 3:38 PM, Joseph Bergin wrote:

In writing an essay in SFW just now, I found myself wanting some things. I recognize that these may be hard or even undesirable given federation, but nevertheless:

(a) Ability to change the title of a page easily. Authors mis-spell things and clean-up is hard if the error isn't caught immediately. This could cause problems with federation, I think.

doable but would have to leave forwarding "redirects" behind.

(b) Ability to merge pages into one. Yes, it is possible with possibly a lot of dragging. If it were possible to select a set of paragraphs for dragging, rather than just one, it would probably be enough -- maybe even better.

possible but has usability considerations. do you have a suggested interface?

(c) Ability to easily split pages into two. Same issue. Currently requires a lot of dragging. But the select-multiple-paragraphs eases this also.

same as (b)

(d) Ability to easily split a paragraph into two adjacent paragraphs. Merging adjacent paragraphs is easy enough. But now, to split you need to drag the new paragraph in to place as well as adding the text.

thought has been that typing a blank line (\n\n) will split a paragraph. blank line at end of paragraph just starts new paragraph. might be obnoxious without undo.

JoeBergin commented 12 years ago

For (a) Ability to change the title of a page: Suppose it were only possible to do this prior to the first fork. Linda Rising once told me about the importance of naming her patterns correctly. Once the names were "adopted" by the community, she lost the ability to come up with better names. But before adoption... Her solution is to keep alias names for as long as possible while developing the patterns - through the shepherding process at least. I don't foresee such a solution here, though. But treating forking as formal "adoption" might work.

For the other three "wants": If it were easy to split a paragraph, the others would be much easier, since you can use ordinary text selection to grab a bunch of paragraphs from one page and enter them in to a single paragraph on another.

To do a split, suppose that the text factory noticed a specific non-standard markup, say <split/> and replaced that with two paragraphs generated from the current one at the point of the "tag". Yes, I realize I may be naive here, but, hey - take a risk. At least this would be more "intention revealing" then double newlines.

interstar commented 12 years ago

Didn't the rise of the web teach us to be more relaxed about broken links? (Compared to, say, the hypertext researchers of the 80s?) Yes, writers shouldn't break links. And redirection (manual or automatic) is nice-to-have. But the underlying technology shouldn't stop people doing what they want just because of this worry.

I'd bet 99% of the "I want to change the name of this page" scenarios are because someone discovers a spelling error or feels that a title is too clunky within the first few hours of creating the page, well before anyone else is likely to have linked to or forked it.

JoeBergin commented 12 years ago

Yes, that was exactly my main use-case.

On Apr 5, 2012, at 8:25 AM, phil jones wrote:

Didn't the rise of the web teach us to be more relaxed about broken links? (Compared to, say, the hypertext researchers of the 80s?) Yes, writers shouldn't break links. And redirection (manual or automatic) is nice-to-have. But the underlying technology shouldn't stop people doing what they want just because of this worry.

I'd bet 99% of the "I want to change the name of this page" scenarios are because someone discovers a spelling error or feels that a title is too clunky within the first few hours of creating the page, well before anyone else is likely to have linked to or forked it.

Reply to this email directly or view it on GitHub: https://github.com/WardCunningham/Smallest-Federated-Wiki/issues/176#issuecomment-4975481

JoeBergin commented 12 years ago

But after a "few hours" or a couple of days, links within the site may have proliferated. So changing a page name may need to look around a bit and update "stuff".

interstar commented 12 years ago

Rather than splitting paragraphs, what would be really useful to me would be a "clone paragraph" function which copied everything in a paragraph but gave it a new id. That would also help with splitting because you could then edit down the two copies to be the split parts of the original.

One of the things I'm thinking of using SFW for is for checklists where you might want to start by cloning a master copy of the list into a specific page, and then update the list there to reflect the status of which items have been done.

JoeBergin commented 12 years ago

Yes, I think that would work as well for me as my earlier suggestion. A bit more general, and the same amount of work (more or less) for my use-case.

JoeBergin commented 12 years ago

I just built (and am testing) a "page renaming" ruby script. It only works on the local server (not over the web). You give it the title of an existing page and a new name. It does this: If the page to be changed is a fork, it does nothing. Otherwise, it changes the title of that page (and its filename) and changes all references in local pages to the original to reference the new page. However, it does this only in the story, not the journal. It writes changed pages to a new directory, preserving the originals.

Does this sound like reasonable behavior, or stupid? Does is sound like desirable behavior? I'll make it available for test. Send me an email.

How do I tell (can I tell) if a page "has been" forked, as opposed to "is a" fork? Sorry, but there is lots I don't yet understand and don't want to act like the proverbial bull in the china shop. I can eventually post these scripts on github, but not before test and consideration.

JoeBergin commented 12 years ago

All my scripts have been uploaded to https://github.com/JoeBergin/Batch-SFW-Scripts. Use with caution.

JoeBergin commented 12 years ago

Thank you, Ward, for getting me going on this and providing the basic script framework I based this on.

interstar commented 12 years ago

Cool @JoeBergin I'm watching your repo. My SFW related scripts are here : https://github.com/interstar/ThoughtStorms

JoeBergin commented 12 years ago

Ward suggested writing a batch upload script using the current API. I have documents on my home system (not a visible server) that I'd like to publish elsewhere (joe.fed.wiki.org, say) without access to anything but a "claimed" SFW site. I don't really know where to begin - or even where to find API docs.

The use-case would be (a) convert some dusty docs to json, wiki format, in private and refine it there. I assume multiple, inter-linked, pages. (b)Push it to a federated site. (c) Let the world enjoy, troll, whatever.

interstar commented 12 years ago

@JoeBergin I have something similar. A private wiki-like thing I wrote a few years ago. One of my near term plans is to export pages from that to an SFW running as a local server, and then syncing what can be public of it, up to my public SFW.

I think this is yet another use-case for a general purpose git-like "pull" from one SFW instance to another.

One step towards that might be a json-merge script that does something like the following :

1) You type

sfwmerge source destination

2) If a paragraph id is in source but not in destination it gets copied to destination and slotted in as near to the right place as the script can find.

3) If a paragraph id is in destination but not in source, it may (subject to a command-line flag) be removed

4) If the same paragraph id is in both we raise a "conflict" flag and do a unix-style diff merge of the text of the two paragraphs into one. (I'd suggest having the two as different paras, but then you have to decide who gets the id)

Does this sound reasonable, everyone? @WardCunningham ?

WardCunningham commented 12 years ago

I see that the server-side handling of the fork action depends on the forked page being on the public Internet.

Fork could get the page from the client instead. This might be helpful publishing pages from Local Storage too.

From behind a firewall one could still simulate the request stream of a user writing the site from scratch. Yuck.

JoeBergin commented 12 years ago

Speculating here, since I don't feel comfortable with the code, but would it be hard to build a JSON factory, that lets you drop a json file, or otherwise correctly formatted json, into a box and the story of the dropped stuff is incorporated into the current page? I assume at the point at which the factory was opened when the drop was made.

WardCunningham commented 12 years ago

We should be able to pass around files that contain a number of pages. Let us assume that the file format is itself a JSON with { slug : page } as its basic schema. I see two interesting questions:

How are they written?
How are they loaded?

Batches of pages could be written by bulk converters. That is Joe's use case. They could also be written from a whole site that is to be moved, or some portion of Recent Changes, or some of the Local Edits when someone is ready to share them. Bulk converters can write where ever they want. Writing from a site or especially Local Edits is tricky. Perhaps composing a Data-URL with the desired content and letting the browser pop up a Save As dialog it possible.

Loading from a drop on a factory sounds expeditions but not exactly consistent with the pluri-potent paragraph notion of factory. Joe is suggestion a smaller drop above but not just a single paragraph or he could just paste it. This also raises the question where does the factory appear? If its on the Welcome Visitors page you could easily overwrite that page with new content. What does this even mean? Merge?

JoeBergin commented 12 years ago

I'm still thinking about refactoring. I am thinking merging, not replacing. Once a factory (currently) is opened (with +), it can be moved to any break point before use. Then, to copy part of one page, I can grab the json of the existing one (source), edit it and then drop the result onto the json factory (in dest). Yes, the factory's result could be several paragraphs. This could count as one edit or several in the journal. The old page (source) isn't changed.

WardCunningham commented 12 years ago

Thanks for clarifying. I knew I wasn't exactly echoing what Joe was suggesting.

What I find attractive about the possibility is that it could be a good approach to ad-hoc manipulation of datasets: grab some json, manipulate it in your favorite tool, drop it back into wiki.

Of course this breaks the revision history because the manipulation is outside the system. Such breaks are inevitable. JSON encourages such manipulations. I find myself in data/pages with my text editor plenty often. This just gives wiki a chance to participate and record that a break has happened.

JoeBergin commented 12 years ago

I'm about to upload an offline page splitter to https://github.com/JoeBergin/Batch-SFW-Scripts. Use with caution. It preserves journals and the original split file.

interstar commented 12 years ago

I just did a very quick and dirty repurposing of tlrobinson's "jsondiff".

Try it at http://project.thoughtstorms.info ... in the form put in two domains for old and new ... eg "http://fed.wiki.org" and "http://thoughtstorms.info" ... then a slug like "smallest-federated-wiki" into the page input.

Not sure how useful this is in it's current state ... but we could potentially adapt this into a useful tool.

GerryG commented 12 years ago

I was looking around for diff tools, and noticed this: http://richardbondi.net/blog/javascript-diff-combines-scripts-to-fill-the-gap/ which takes two diff libraries, one for line diff and another for word-diff so they work together.

I also was looking into what Wagn does for diff, and it actually has custom ruby code for html based diff: https://github.com/GerryG/wagn/blob/master/lib/diff.rb

interstar commented 12 years ago

Yep @GerryG I used that snowtide diff (referred to in your first link) to make an in-browser diff tool for a job I was in a few years ago where we didn't have source-control!!!! It was invaluable. You can't believe how much time that library saved me and how grateful I am to snowtide.

For our purposes I'm sure we're going to need a custom diff that not only works with json but does something sensible from our perspective. (Eg. we're a lot more interested in diffs in the story than the journal so perhaps the tool should focus on that.)

GerryG commented 12 years ago

On the topic of (a), I really think it needs some deeper thought to get this right. You really do want to be able to change names, but that is clearly a trickier issue in a Federated Wiki. Wagn has a powerful rename that fixes all the references, and Wagn has a lot of them, but that isn't going to work in the Federated situation because you can't possible fix all of the external references. You might not be able to enumerate them, so you certainly won't be able to update all of them.

I'm starting to think through the idea that a title is content just like any other content. This kind of thinking means the chunks of content, whether paragraphs of titles, heading and whatnot, need to have some sort independent identity, something link purple numbers. Then you could create a purple number for when the original title is created. External references could be by purple number, so changing the name of a page would not require updating any references.

I think we maybe want a bit of formalism around "binding content" where you either create a new name (and its purple number) and bind content to it (a list of paragraphs, which could be formally the list of purple numbers it contains), or you rebind an existing name (purple number) to an updated list of paragraphs.

Now we can begin to consider change histories a little differently. Mapping onto the git terminology, any changes to bindings, are like "tree" changes, and changes to the paragraphs and names are changes to blob objects bound to the trees.

WardCunningham commented 12 years ago

The slug gives us some freedom to change names. Any name that produces the same slug is equivalent. You could edit the capitalization and some punctuation in a title without changing anything else.

You could also replace the original page with a forwarding hint: "This page was renamed to [[Foo Bar]] on April 13th, 2012"

Wikipedia's article namespace is something like 40% redirects. They handle these automatically when visited. There are lots of knotty little problems here arising from inconsistencies and loops.

We can live without automatic redirects and all the nasty issues by keeping the reader in the loop. This is not the same disservice to the user since the referencing page is still onscreen.

GerryG commented 12 years ago

Does 'slug' actually mean something? If not, maybe you should just call it a key?

Wagn does exactly the same thing with cardnames, we map all names to a key, so each key represents an equivalence class.

I'd like to change the to_slug code so that it is more in line with Wagn keys. There are a couple of things that are different. We fold all non-key characters into the space equivalent and strip and compress them. You just strip the non-key characters. I think it is better to leave a space. We also do underscore processing, the reverse of camelcase so that MyName maps to my_name, so My_Name, my_name, MyName are all the same, while you would have MyName the same as myname. I have a pretty strong preference for the former not just because it is the way Wagn does it.

Wagn also singularizes, but that is much more of a judgement call. I thought we could generalize that idea to the idea of "without inflections" which could have a multi-lingual equivalent processing. This is all complex to get right, so probably not wanted for SFW.

GerryG commented 12 years ago

But fundamentally the "giving a purple number to a name" is an important concept. I think it is worth exploring how adopting such an idea would help or hurt SFW. I see a lot of good things coming from the idea of tracking changes from paragraph identifiers that are not permanently tied to the name and content. Calling them purple number is optional, but I like the idea of using ideas that are well explored and extending them as needed rather than inventing something new.

In terms of the redirect issue, loops and other nastiness. I would not put alias links in the content, I would make them bindings and require that they map to content, not make them 'content that is a reference'. Wagn already has a Pointer card type for references, but I would never try to use that for namespace things like aliases and equivalences. We are already thinking about that for multi-lingual support and generalizing the 'inflections' concept. Base characters with accent marks can be handled much like case folding, but irregular plurals and other morphological variations may require "extra index entries to the same name".

What I'm really talking about here is "hard links" vs. "soft links". I think we want hard links.

WardCunningham commented 12 years ago

I don't know the etymology of slug. I learned it on the job.

I would like to modify the algorithm so there is never a hyphen at the beginning or end of a slug and never two in a row.

In my wildest dreams I'd like to be able to change Welcome Visitors to Bienvenue aux Visiteurs and have the slug stay welcome-visitors.

WardCunningham commented 12 years ago

Is there an easy way to strip accents from latin characters? Maybe with a regex or something equally common and well supported?

GerryG commented 12 years ago

That is worth looking about for. I'm not aware of anything, and I will see if I can find anything.

Don't you worry about collisions given you will have so many identical slugs? I guess I'm thinking that an id that is completely independent of the name has some advantages, but it would be similarly duplicated in a move to a new instance, so maybe it isn't really a different problem.

GerryG commented 12 years ago

I have more work to do before I can suggest moving the the code I split out of Wagn, but it does already have a lot of properties that you want. Having only single internal "space equivalent" is there.

You also want to think about a couple of other classes of character. We have two characters that are in cardnames and become part of the key, '*' and '+'. The first we use in the initial position to indicate a "system card" and keep them separate from the content cards in the namespace. '+' is our "name segment" character. Then there is another handful of characters that are just banned in names so they can be used in URL syntax outside of names. Github is right now crashing my browser tabs on my repo right now, so I can't look this up in the code. There are like four of them, I think '~', '.', '/' and one more maybe. You probably want to ban these as well for similar reasons.

JoeBergin commented 12 years ago

The renamPage batch/offline script will now replace the original page with a forwarding hint.

interstar commented 12 years ago

Hmmm .. in my personal notebook, SdiDesk, I had hierarchies of "subpages" and sub-sub-pages etc. separated by / .

When I converted this to SFW I turned / into double hyphen -- to distinguish from ordinary. As far as I can see we don't have any other non-alphabetic separator except hyphen so having a way to distinguish different uses of it ie. a word separator vs a "larger aggregation" separator, is useful.

What's the problem with doubled hyphens? And if we get rid of them, can we have something else to distinguish "pseudo-spaces" from "bundles of pages"?

interstar commented 12 years ago

@GerryG I think an id that is related to, and logically derivable from a name is an essential part of what makes wiki so good.

It means that page-names can be "guessable". When I'm authoring a link, I guess what's the most obvious name for an idea or thing and most of the time, I'm right. Even better, sometimes someone links to a non-existing page, and later on, someone else via a different route, fills that page in. "Magically" the link now works. If you replace page-names with arbitrary identifiers, that never happens.

GerryG commented 12 years ago

The -- convention, or I would recommend at least two characters that are not word characters, but are carried in the key (slug). One for a name separator (/ or -- for you, + for Wagn). I think it is best to ban '/' so it can be used at the next higher semantic level above.

And I totally agree about guessability. I think, though, when you are folding the different forms of a word into one key, that almost necessarily means a one to many relationship of keys to objects (pages, cards, sub-pages, whatever the objects being indexed are in the model). Maybe that's disambiguation pages, but more likely it is contextual.

In Wagn, take for example the key 'tag', you may want that to be a cardtype where the name is "Tag", and you may also have cards, "+*tags". Tag and foo+tags both reference the same card in the Wagn world (foo+tags has three cards, foo, tags and foo+tags are each a card), but the context is different and it should be able hold onto a different inflection or capitalization, etc. in each context. It isn't a use case for disambiguation.

GerryG commented 12 years ago

I've done a little exploring on the issue of diacriitcals and such. This link was useful: http://ahinea.com/en/tech/accented-translate.html

I gather it is straightforward to "decompose" such characters into the base character and the accent mark as a second "combining character". Then you would just filter the marks out of the key. You would want to delete it, and not make it inter a whitespace equivalent as Wagn does with most special characters. I think SFW just deletes them already.

GerryG commented 12 years ago

For ruby: http://www.jroller.com/obie/tags/unicode

def to_asciiiconv converter = Iconv.new('ASCII//IGNORE//TRANSLIT', 'UTF-8') converter.iconv(self).unpack('U').select{ |cp| cp < 127 }.pack('U_') end

WardCunningham commented 12 years ago

I suggest the slug formation conversation move to issue #156. Thanks.

GerryG commented 12 years ago

Good idea. Looking at the comments here, the part that actually belongs here relates to sub-page structure, and how sub-page elements will be identified and tracked (roughly equivalent to having purple numbers). I think that is key to being able to track where things move as apposed to how things change, something like the blob/tree change objects in git.

harlantwood commented 12 years ago

It's a little to half-baked for me to want to add it to the List-of-Batch-Import-Examples, but I will mention here that I wrote a ruby script to convert markdown files on my local machine to HTML, and upload them to SFW instance(s).

The interesting features apropos this discussion is uploading to SFW by HTTP PUT'ing the JSON to a create action.

Here is a sample page generated by the script: http://enlightenedstructure.harlan.fed.wiki.org/view/software-zero

This was a spike. I have shifted my efforts away from processing local markdown -- my intention is to instead start with arbitrary HTML (which may be generated from my markdown or any other source), and strip it down to plain text, or super basic HTML, for insertion into SFW.

I mention this script mostly for the "sample code" of creating a page via HTTP. In case you want to actually use or extend it, be aware that there are many places the script falls short -- those I can see are enumerated in the README:

Will only upload pages once. If a page by that name already exists, the script will warn you of a conflict, but will not overwrite the existing page.
No index page is currently generated. The pages will exist, but only to those that know their paths.
Does not handle images
Does not attempt any conversion of links to wikilinks

WardCunningham / Smallest-Federated-Wiki

[For Discussion] Refactoring pages #176