Open WardCunningham opened 5 years ago
I've revised the json.rb converter to record in the output more information about decisions it has made. This lets me assess results by selecting subsets with jq. Here, for example, is a successful recovery:
{
"date": "November 3, 2011",
"text": "We aim to make simple things simple and complex things possible...",
"rev": "22",
"page": "AlanKayOnSmalltalk",
"copy": true
}
On complete failure I still produce the page name so that I can apply more detailed diagnostics driven from jq results.
{
"page": "AdelinoRodrigues",
"trouble": true
}
See where I begin diagnosis with jq -r 'select(.trouble)|.page'
above.
Ward, [Take your time answering or don't answer at all. I will not be offended in the least.] Is there some way to edit pages on C2 right now? I went to your issues to see if there was any quick way I could help you. I found issue 39 and investigated. One of your issues regards a link that is now a bad place to go. I went to the issue to clarify and discover I had already entered the issue in 2019 (and also asked if there was some way to edit). Issue here. https://github.com/WardCunningham/remodeling/issues/39 As an open source author myself I appreciate that stuff like this can take a long, long time to deal with, and it is a bit soul destroying to have something where you give, and give, and give, until you have nothing left to give, and then you give some more. :), So... there is no rush on my part. Take your time. I will still be part of the C2 community as long as you have it. I will check back with issues, review code, etc. at some future time.
Thank you for your selfless contribution to all this. It is truly appreciated.
Cheers!
B.
On Thu, Nov 26, 2020 at 12:13 PM J. Sherman notifications@github.com wrote:
@GodsNightmare approved this pull request.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WardCunningham/remodeling/pull/32#pullrequestreview-539474036, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEWAV3CKD24QEANZNTHCETSR2EEZANCNFSM4GH7AJVQ .
-- Bob Trower --- From Gmail webmail account. ---
Thank you for your understanding.
A small percentage of pages have character set problems than never bothered the perl that implemented the original wiki. As I have chosen to work in ruby and more recently javascript I find that I can't touch these files. In a pinch I have written c code with getchar and putchar which can eat through anything. (I'm not using standard io with might be picky.)
In other news, last fall a Portland State University capstone project worked through these problems and others but the pandemic got in the way of the final tech transfer. Ongoing work should start there.
The read-write access to the original content is through federated wiki. You can edit pages and your edits will persist for your own benefit in browser local storage. If you want to share edits you can host a federated wiki instance of your own and save any edits you make there. If an interest group were willing to take ownership of some content, maybe continuous integration pages, of implementations of fizz-buzz, I could find some way to announce this work as a sister project when people read the original.
I can't reach http://c2.fed.wiki.org/ right now, if that is the site.
It's not a big deal. I will return to this at a later date.
Cheers!
B.
On Thu, Nov 26, 2020 at 8:55 PM Ward Cunningham notifications@github.com wrote:
Thank you for your understanding.
A small percentage of pages have character set problems than never bothered the perl that implemented the original wiki. As I have chosen to work in ruby and more recently javascript I find that I can't touch these files. In a pinch I have written c code with getchar and putchar which can eat through anything. (I'm not using standard io with might be picky.)
In other news, last fall a Portland State University capstone project worked through these problems and others but the pandemic got in the way of the final tech transfer. Ongoing work should start there.
The read-write access to the original content is through federated wiki. You can edit pages and your edits will persist for your own benefit in browser local storage. If you want to share edits you can host a federated wiki instance of your own and save any edits you make there. If an interest group were willing to take ownership of some content, maybe continuous integration pages, of implementations of fizz-buzz, I could find some way to announce this work as a sister project when people read the original.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WardCunningham/remodeling/pull/32#issuecomment-734532285, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEWAVYF4DAB7LHRWPAHEZDSR4BIBANCNFSM4GH7AJVQ .
-- Bob Trower --- From Gmail webmail account. ---
Oops. I meant to include a link. Here I searched for wiki and picked a few pages to illustrate.
We add some tools for finding and correcting character code problems. Rather than work on the whole wiki database, we select out troublesome pages into a
trouble
directory. As we improve our json.* script chain we will produce possibly improved pages in apages
directory. From this we can select out specific remaining problems which we diagnose at the line level and collect in theinvalid
directory. From this we inspect the octal characters and find substitutions then repeat the process looking for improvements.bulk convert
This reports progress on individual files using the codes
.
,,
andx
for retrieved text, copy or still trouble. Counts will measure progress.diagnosis
A typical invalid file shows numbered lines with invalid characters.
Good technique is to flip through the reports working from the shortest entries first.
When a particular line of a particular file is of interest, isolate that line and dumb it in octal.
In this case the invalid character is octal 231.
substitution
Perl is happy to change these characters to something preferable. In this case, the
tm
is probably a joke and won't be missed.These commands go into
json.pl
andcheck.pl
which have a similar structure. We repeat from the beginning and find less work to do. (In this case, way less work todo having worked a few substitutions before repeating.)