Split Document Sets - Githubissues

samwiddowson commented 6 years ago

I love this project. Having set up a Unix CLI environment on an old laptop, this is exactly what I needed to turn it into a distraction-free "typewriter".

I normally do my creative writing in Markdown or plain text, across different devices synchronised via a cloud service, but not every device I write on will run WordGrinder, and I think the existing v3 document-set files will be cumbersome to edit elsewhere.

With this in mind, I've forked the project and I've started making changes to implement a new file-saving format, in which the .wg file becomes more of a session management file, referencing external files by path and explicitly-declared file formats. This uses the import/export functions to write the separate document files, and implements an importer and exporter for the existing WordGrinder document format.

I also intend to implement some limited Markdown import functions, at least to mirror the current export function (ref. currently-open issue #34 ).

This obviously requires some sizable changes, and I'm happy to make them for my own benefit, but is there likely to be any interest in pulling them into the main project when they're done?

daneelcayce commented 6 years ago

@samwiddowson I actually came here to request a similar feature set, so this is something I would definitely be interested in -- it sounds like you and I have similar writing processes/needs.

I'll be keeping an eye on your fork; let me know if you need somebody to help with it!

samwiddowson commented 6 years ago

Quick progress update: the bulk of the coding is basically done for this now, though it's rather buggy in places and needs a few more changes.

On a local build, I implemented UI changes which discuss the handling of Documents and Sessions rather than Document Sets. Partially works, but it's fairly buggy and a few things need rewriting.

I'm still hammering away at it, though, and there are still a few things like Markdown import I want to implement.

davidgiven commented 6 years ago

Yikes, sorry, missed this.

In general I'm not keen on having document sets spread across multiple files. It complicates the conceptual model --- having just one thing which you can save, and copy, is so much easier than trying to marshal a variable number of linked files. Maintaining sensible behaviour if some of those files are missing is very hard. Remember the bad old days of Windows OLE and linked documents?

Also, WordGrinder very deliberately separates saving (in the native format) and exporting (in alien formats) for much the same way that gimp does: alien formats can't store all the information in a WordGrinder document, which means it's possible for the user to make some edits, think they've been saved, and then it turns out they're not. It gets particularly gnarly when combined with automatic import from an alien format which doesn't support features which WordGrinder does --- you end up with two lossy conversions interacting with each other, with scope for fairly spectacular data loss.

I understand the use case, but I think it'd be better met via an auto-export feature: being able to nominate that certain documents in the document set get exported to alien formats every time the document set is saved. (Should be easily implementable via the addon architecture, too. There isn't an event which fires when a document is saved but one can be added in two lines of code.) That way the source of truth remains the WordGrinder file, which can live in your VCS, and the exported documents are derived from it. But I think that importing should always be a manual process.

(Said manual process, by the way, can always be scripted. You can run arbitrary Lua scripts in a running WordGrinder instance by doing wordgrinder --lua <filename>. That's how the unit tests work.)

samwiddowson commented 6 years ago

Agreed, this change is a huge paradigm shift. Completely understand the reasoning behind sticking with the current model.

Not to worry; this has been a good project to learn Lua with. I'll clean up the bugs on my fork, but keep my master branch free of these changes.

I'll probably keep my modified version for personal use, but I agree with the limitations you mention, so I won't recommend it for general release.

There are a few minor tweaks I have been thinking about, so I might send a few other minor, sensible pull requests your way.

davidgiven commented 6 years ago

Cheers. I'm still interested in what you're doing, mind (particularly the Markdown importer...).

Incidentally, it occurs to me that it would be trivial to add support for loadable extensions. A lot of new functionality is being farmed out to the addon system anyway, and loading those either through the commandline (-x addon.lua) or via the global config file would be really easy. That way you wouldn't need to rebuild WordGrinder.

The downside is that such an extension wouldn't be able to change the core. But the core probably wants to be extended with more hooks and events anyway...

authorfunction commented 6 years ago

@samwiddowson I've been using your branch for (academic) writing today and just wanted to drop a comment and say I love it! Wordgrinder is amazing in itself, but the addition of sessions and markdown saving fits perfectly with my writing process, so this is basically everything I've ever wanted for terminal based word processing (well, besides footnote support...). Keep up the the good work, and I'll promise to report any bugs that i encounter.

samwiddowson commented 6 years ago

Glad you like it! I've made quite a few improvements and fixes in the let couple of days, including proper auto save support and the scrapbook, which gets saved in the session file instead of externally.

Will push them to my session-management branch tonight.

samwiddowson commented 6 years ago

to be clear: I meant improvements to the session model! The original design is still great. :-)

samwiddowson commented 6 years ago

Right. Only thing that's now missing from my personal wishlist is a bit of imported file validation for HTML (just checking that a tag was found) and the native WordGrinder format.

After that, these are the changes I think I'll submit for consideration in the main WordGrinder trunk:

1. Markdown import

I had a couple of false starts with this before deciding it wasn't worth reinventing the wheel. I ended using some existing functions to convert it to an HTML string, then loading the HTML as a document.
Licenses and attributions are included.
It could probably be handled more efficiently, but I tested this on quite a few large files and it wasn't particularly slower than a standard html import of a similar size. I've done NaNoWriMo several times, so was able to import some 50k-80k word documents to test it.
2. "Improved" wordcount
The existing method summarises all words in paragraphs; the updated method summarises all words of non-zero length in paragraphs.
Document loading for large documents was slightly slower, but only by a small fraction of a second. If there's concern about this, I could include a global setting to allow the user to select a more accurate word count if desired.
3. Documents menu renumbering
Fairly trivial, but I renumbered the documents menu accelerator keys to start at 1 and include the 10th as zero before moving on to the alpha characters, because it closer resembles the standard keyboard layout.
4. Recent DocumentSets
Labelled in UI as recent Sessions in my version, of course.
Maintains an ordered list of up to 10 recent document sets in the GlobalSettings file. Rebuilds a recent documents menu when the list is updated.
5. Filebrowser change:
First thing I did the first time in FileBrowser was press \<Enter> to go to the parent directory because ".." was highlighted, but this just gives user feedback that '' is not a valid filename.
Along with the wordcount weirdness, this was a first-impressions issue for me and I wanted to improve the experience for new users.
I realise earlier versions of WordGrinder entered ".." as the filename when the Browser was opened, which forced users to delete it before typing the name of the file to save.
Solution was a "special case" to allow to read a blank filename as ".." if the cursor is currently on ".."

Quite a few changes, so I won't present them all at once or in a hurried way. One at a time with no pressure to respond in quick time. :-)

Also, I have an old low-spec netbook I want to benchmark the word count change and markdown import on before I submit pull requests for those.

davidgiven commented 4 years ago

Belatedly: I'm actually making a bunch of changes for the upcoming 0.8 release. This includes fixes for (4) and (5).

(2) is tricky because the word count actually gets calculated every time the document changes --- because each paragraph knows how many words it has, this is just a matter of summing the count for each paragraph, so it's O(1000) or so. Looking for non-empty words would require looking at each word, and so O(100000) or so. It's not sufficient to do the word count at load time because then it won't update as the document is changed. It may be possible to cache the effective word count per paragraph, but... that's getting a bit complicated. I agree that this would be nice to have.

(1) would also be really nice to have, but Markdown is horrible to import as there's a million slightly different and ambiguous specifications. There are pure Lua markdown parsers but they're all ad-hoc parsers. I'd really like a CommonMark parser, but there aren't any pure Lua ones, and I'd rather not import a C markdown parser. But WordGrinder really should have this.

davidgiven commented 4 years ago

Markdown import is done, in #137.

samwiddowson commented 4 years ago

It's been a very long time since I looked at the code, but if I recall correctly, it wasn't necessary to recalculate the word count every time a new word was added. I think I just incremented the total word count if the current word length changed from zero.

I was running it on an ancient netbook at the time and had no performance issues i in 75-80k word documents.

davidgiven / wordgrinder

Split Document Sets #51

1. Markdown import

2. "Improved" wordcount

3. Documents menu renumbering

4. Recent DocumentSets

5. Filebrowser change: