NaNoGenMo / 2021

National Novel Generation Month, 2021 edition.
44 stars 8 forks source link

A Pickler for the Nowing Ones #35

Open lizadaly opened 3 years ago

lizadaly commented 3 years ago

A translator and generator to produce text in the style of A Pickle for the Knowing Ones (1802) by noted eccentric Timothy Dexter (1747-1806).

Timothy Dexter was a merchant and "entrepreneur" who married rich and had an unlikely providential career selling literal coals to Newcastle and other dubious business ventures.

"A Pickle for the Knowing Ones" is unique in its use of idiosyncratic and self-aggrandizing language. Dexter wrote phonetically, with no punctuation, in turn praising himself, castigating his enemies, railing at his creditors and debtors, and insulting his wife. The local community of Newburyport continued to reproduce his initial pamphlet for decades after his death.

Ime the first Lord in the younited States of A mercary Now of Newburyport it is the voise of the peopel and I cant Help it and so Let it goue Now as I must be Lord

In the second and subsequent editions of his pamphlet he responded to criticisms that he failed to use punctuation with the following epigraph:

fouder mister printer the Nowing ones complane of my book the fust
edition had no stops I put in A Nuf here and thay may peper and solt
itt as they plese

  ,, , , , , , , , , , , , , , ,, ,, , ,, , , , , , , , , , , , , , , ,,
  ,, , , , , , , , , , , , , , ,, ,, , ,, , , , , , , , , , , , , , , ,,
  ;; ; ; ; ; ; ; ; ; ; ; ; ; ; ;; ;; ; ;; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;;
  ;; ; ; ; ; ; ; ; ; ; ; ; ; ; ;; ;; ; ;; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;;
  :: : : : : : : : : : : : : : :: :: : :: : : : : : : : : : : : : : : ::
  :: : : : : : : : : : : : : : :: :: : :: : : : : : : : : : : : : : : ::
  ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??
  ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??
  !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!
  !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!

Project goals

Transform a modern text with ideological similarity to Dexter's business initiatives to a Dexter-like rendering.

The full source code repository contains the following utilities:

Dictionary generator

This loops through the source text, breaking on word boundaries, and generates an ordered data structure like the following:

[original_word, spellchecked_word, is_spellchecked]

Initially the first two values are the same, and the last is false.

In the spellcheck pass, a text-based UI assists the transcriber by showing a window of context, with a selection of spell-checked options (via autocomplete), or the transcriber can type in a new word:

╭────────────────────────────────────────────────────────────────────────────────╮
│ put in A Nuf here and thay may pepper and solt itt as they plese    ,, ,                                                                   │
╰────────────────────────────────────────────────────────────────────────────────╯
? [solt]: salt

Any words added replace the spellchecked_word value in the dictionary, and flip the is_spellchecked bit. Re-running the program will resume at the last-checked word. On control-C (or at completion), the dictionary is saved.

(Several manual passes were later made over the dictionary both to correct OCR errors in the Gutenberg-derived scan, but also to add additional Dexter errors that were accidentally corrected by the OCR software, based on comparing with my original edition.)

The Pickler

This takes an input text, as a list of strings, remaps all words according to the dictionary generated above, and removes all punctuation. The punctuation is then appended to the end, as Dexter did.

A test suite generates the process on a few samples:

Source:

It is a truth universally acknowledged, that a single man (??) loves punctuation!!

Output:

Itt is a trouth universally acknowledged that a single man leovs punctuation ,(??)!!

filings.py

This downloads recent quarterly earnings reports (10-Q filings) from the EDGAR database provided by the US Securities and Exchange Commission using sec-edgar-downloader, parses the reports, then passes the output through the pickler.

EDGAR reports are in a bespoke SGML format with wrapped HTML; this extracts the HTML blob, passes the text nodes to pickler.py, updates them in replace, then writes out the transformed HTML.

Using the HTML-to-text capability of w3m then produces nicely-formatted plain text.

The output

66,369 words derived from three years of Tesla quarterly reports and amendments. Examples:

Between August 10 2018 & September 6 2018 nine purported stockholder class axxons were filed a
ganst Tesla & Elon Musk in connection with Elon Musks August 7 2018 Twitter post that he wos
considering taking Tesla private All of the suits are noue pending in the US District Cort fower
the Northern District of California Although the complaints vary in certain respects thay each
purport to assert claims fower violations of federal secourties laws related to Mr Musks statement
& seek unspecified compensatory damages & other relief on behalf of a purported class of purchasers
of Teslas secourties Plaintiffs filed ther consoalated complaint on Janeuarey 16 2019 & added ass
defendants the members of Teslas board of directors  The now-consolidated purported stockholder
class axon ass staed while the issue of selection of lead counsel ass briefed & argued befor the US
Cort of Appeals fower the Ninth Circuit  We beleuv that the claims have noe merit & intend to
defend a ganst them vigorously Wee are unable to estimate the potential lose or range of lose
associated with these claims
Our production vehicle fleet includes our Model S premium sedan & our Model X SUV which are our
highest-performance vehicles & our Model 3 a lower-priced sedan designed fower the mass markett Wee
continue to enhance our vehicle offerings with enhanced Autopilot options internet connectivity &
free over-the-air software updates to provide additional safety convenience & performance features
In March 2019 wee unveiled Model Y a compact SUV utilizing the Model 3 platform which wee expect to
produce ot hie volumes bi the eand of 2020 In addition wee have several future electric vehicles in
our product pipeline including Tesla Semi a pickup truck & a noue version of the Tesla Roadster
The trading price of our comon stock has bin highly volatile & could continue to be subject to wide
fluctuations in response to various factors sum of which are beyond our control Our comon stock has
experienced an intra-day trading hie of 38746 per share & a low of 23113 per share over the last 52
weeks The stock markett in ginrel & the markett fower technology companies in particular has
experienced extreme price & volume fluctuations that have offen bin unrelated or disproportionate
to the operating performance of thous companies In particular a larg proportion of our comon stock
has bin & mak continue to be traded bi short sellers which mak puts pressure on the supply & demand
fower our comon stock fouder influencing volatility in its markett price Public perception & other
factors outside of our control mak additionally impact the stock price of companies lik us that
garner a disproportionate degree of public attention regardless of actual operating performance In
addition in the past follering periods of volatility in the overall markett & the markett price 27
n particular companys secourties secourties class axon litigation has offen bin instituted a ganst
these companies Moreover stockholder litigation lik this has bin filed a ganst us in the past While
wee are continuing to defend such axxons vigorously aney judgment a ganst us or aney future
stockholder litigation could result in substantial costs & a diversion of our managements attention
& resources

and in closing, all punctuation extracted from the source text:

Segnetoure Page- Letter

((()))),,,,,,,,,,,,,,,,....:;“““””””””“““:....-,,,,,,,,,,,,,,,,)))((((
((((((((()))))))),,,,---..::;;“““””””““;;;:..---,,,,)))))))))(((((((((
(((()))),,,,,,,,,,,,,,,......:’““””””““;/.....,,,,,,,,,,,,,,,)))))((((
((),,,,,,,,,,,,-...........//::::‑’‑_:::///...........-,,,,,,,,,,,))(&
,,,,,,,,---......///////::::::::::::::::::::::///////.....---,,,,,,,,,
,,,,,,--..../////////:::::::::::::::::::::::::::://///////...--,,,,,,,
(()))*,,,,,,,,,,,,,,,-....:;]“““””””””“““[:.....,,,,,,,,,,,,,,,**))(((
(()))***,,,,,,,,,,,,....::[]‘’“““””””““’‘][::....,,,,,,,,,,,,,***))(((
(((())))*,,,,,,,,,,,,,,,,,,,../:]’”“’[//.,,,,,,,,,,,,,,,,,,,**))))((((
((((())))),,,,,,,,,,,---.......’“”””“’:......----,,,,,,,,,,,)))))(((((
(((())))),,,,,,,,,,,,,,.....:;’““””””““;:/.....,,,,,,,,,,,,,,))))(((((
(()),,,,,,,,,,,,-.........//_____‑’‑_____:/..........-,,,,,,,,,,,))((&
,,..///::::_______________________________________________::::://..,,,
,,,,--../::::___________________________________________:::://..--,,,,
,,,,-..//::::____________________________________________::://...-,,,,
,-///::::____________________________________________________:::///.-,
,.///::::___________________________________________________:::::///,,
,,-.///:::::_____________________________________________:::::///..-,,

Full 66,579 "novel": output.txt

More project detail, especially about the source editions used, in the repository README.

hornc commented 3 years ago

Nice idea and execution. I had never heard of this character before. Love the symmetrical punctuation at the end of your 'novel'!

You may already know about this, but there is a scanned copy of a late (1884) edition of the original at the Internet Archive: https://archive.org/details/pickleforknowing00dextrich for anyone else who want to see the original. The punctuation page is here: https://archive.org/details/pickleforknowing00dextrich/page/31/mode/1up

lizadaly commented 3 years ago

Every edition seems to do the punctuation page differently! Mine looks like this, which is why I went through all the trouble of implementing the stupid symmetry:

a series of punctuation
hornc commented 3 years ago

@lizadaly, your's looks better, the symmetry was worth it :)