gethryn / doubleL

Perl to restore LL to batch of text files where second L is replaced by space
0 stars 0 forks source link

Single spaces following words that should end in "LL" [fka Don't know what went wrong] #5

Closed GregTheGrate closed 3 years ago

GregTheGrate commented 3 years ago

the pil ow--"sleeping o ----- pillow is in the list

GregTheGrate commented 3 years ago

Another example --- stil didn't sound right

I massaged my forehead, which was now throbbing not just from a mistral-induced headache, but from this surreal conversation. Eli's story stil didn't sound right. He'd left something out. "What aren't you telling me?"

GregTheGrate commented 3 years ago

tol booth --- toll is in the list

The Jaguar purred toward the tol booth and slid onto the highway. Office buildings in various stages of construction sprouted like weeds after rain on both sides of the road. Red gashes in the clay soil looked like open sores where the earth had been bul dozed and flattened. Two years ago this had been farmland. Maybe if I'd seen the destruction unfold gradually it would have seemed less brutal.

GregTheGrate commented 3 years ago

had been bul dozed and -----bull is in the list so should have been changed. Will add bulldozed

GregTheGrate commented 3 years ago

stil haunted me

Last night's conversation with Fitz in the tropical darkness, his whispered accusations and revelations in the shadowy recesses of that porch, stil haunted me. For the rest of the evening I'd felt like a sleepwalker, the jet lag clouding my judgment about what was real and what I'd imagined.

GregTheGrate commented 3 years ago

smal corridor

Directly off the main room an arched wrought iron gate led to the wine library with its deep leather chairs, wine barrel end tables, and our growing collection of books on colonial and contemporary wine making. A heavy door that always reminded me of the entrance to a monk's cell led to a smal corridor and the offices.

GregTheGrate commented 3 years ago

wil you

"Over here." Mason pointed under an end table. He stood up and peered at the spot he'd just indicated. "It's too dark. Hand me that lantern, wil you?"

GregTheGrate commented 3 years ago

wel byond

It probably wasn't the smartest decision in the world to try to hang on to the vineyard when Leland had left us nearly bankrupt. Our new vintner seemed like the kind of guy you'd hire as a bouncer at a night club. Eli was right that Highland House, neglected for years, needed repairs that were wel beyond our bank balance.

GregTheGrate commented 3 years ago

ripening al together. --- have added alltogether, but shouldn't it have corrected al?

GregTheGrate commented 3 years ago

recal ----recall is in the list

What was surprising was that she'd stopped writing for one two-year cycle--or else that volume was missing. I did some figuring, trying to recal what happened twenty years ago.

gethryn commented 3 years ago
gethryn commented 3 years ago

Various updates in script to account for LL at the end of a word (both in middle and at start of sentence), e.g. alltogether.


# find edge case: words ending in ll that don't have two spaces before next word.
my @ends_with_ll = grep { m/l\s$/ } keys %replace;

...

my $regex_ends_with_ll = join "|", map { quotemeta } sort { $b cmp $a } @ends_with_ll;

...

$regex_ends_with_ll = qr/$regex_ends_with_ll/;

...

my @matches_ends_with_ll = $line =~ /(?<=[$before])($regex_ends_with_ll)(?=\w)/g;
my @matches_ends_with_ll_startline = $line =~ /^($regex_ends_with_ll)(?=\w)/g;

my $count = scalar @matches + scalar @matches_startline + 
            scalar @matches_ends_with_ll + scalar @matches_ends_with_ll_startline;

...

# fix any words that matched
$line =~ s/(?<=[$before])($regex_ends_with_ll)(?=\w)/$replace{$1} /g; # edge case ends with ll
$line =~ s/^($regex_ends_with_ll)(?=\w)/$replace{$1} /g; # ends with ll at start of line

...

#add the matches to the list of all matches for the file
@all_matches = uniq(@all_matches, @matches, @matches_startline, 
                            @matches_ends_with_ll, @matches_ends_with_ll_startline);