LibriVox / librivox-catalog

LibriVox catalog and reader workflow application
https://librivox.org
MIT License
36 stars 17 forks source link

Preserve line/paragraph breaks in description from Template Generator #188

Closed twinkietoes-on closed 3 months ago

twinkietoes-on commented 5 months ago

A reader fills in the Template Generator, adding paragraphs to the description. These show as paragraph breaks in the database entry, but they're lost at cataloging, both on the catalog page and on the data sent to Archive.org. Capture1 Capture2

Current fix is to manually enter <br /> html to the database field. (Paragraph tags cause other problems in the display.)

twinkietoes-on commented 5 months ago

If this involves editing the code that goes to Archive, I'd like to see a couple additional line breaks in there, between the different boilerplate components (top and bottom) and the description, to separate them into paragraphs as well. That's not as critical, but a "might as well, while you're in there" item.

garethsime commented 5 months ago

Getting all the newlines to show in the catalog view is a one-line change, but it presents an interesting question around what to do with descriptions that are already using <br /> tags. For example, there might be entries like this in the management view:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec tincidunt auctor ligula. Donec eu lacinia metus. Vestibulum vel felis vel sapien condimentum ullamcorper eget eget massa. Vestibulum pharetra purus eu erat imperdiet, vitae dignissim massa malesuada. Curabitur sit amet magna quis elit tristique fringilla. <br /><br />

Quisque id lorem leo. Sed iaculis nisl lectus, quis congue leo blandit in. Vivamus facilisis velit ut metus fermentum, ut tristique ex sodales. Proin gravida viverra velit, sed finibus dui mollis a.

Which would then end up being rendered with extra breaks like this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec tincidunt auctor ligula. Donec eu lacinia metus. Vestibulum vel felis vel sapien condimentum ullamcorper eget eget massa. Vestibulum pharetra purus eu erat imperdiet, vitae dignissim massa malesuada. Curabitur sit amet magna quis elit tristique fringilla.    

Quisque id lorem leo. Sed iaculis nisl lectus, quis congue leo blandit in. Vivamus facilisis velit ut metus fermentum, ut tristique ex sodales. Proin gravida viverra velit, sed finibus dui mollis a.

My copy of the scrubbed database is probably a year old by now, but it's got several thousand projects that have br tags (both kinds) followed by newline/return characters.

Perhaps we could do something clever where we only do the replacement if there are no br tags in the description, or we could try to collapse together br tags with adjacent whitespace, but that seems like it could lead to confusion about when the br tags are needed and when they aren't.

Thoughts?

garethsime commented 5 months ago

Here's a cool idea -- https://www.darklaunch.com/php-normalize-newlines-line-endings-crlf-cr-lf-unix-windows-mac.html

They normalise all of the line endings and then check that you don't get more than two in a row. It might not be 100% what people intended, but I think it would be pretty close, and it would prevent any of the descriptions from having massive spaces like I was suggesting in the last comment.

twinkietoes-on commented 5 months ago

That sounds like a good solution. Could it go on the test server, to be sure it's doing what we want it to do?

redrun45 commented 4 months ago

Looks like this went straight to live. Seems to work as advertised (edit: on newly cataloged entries) though! I can update the Cast Coder to leave out the <br> tags after each role now. I'll shoot for this weekend on that.

@twinkietoes-on, any other special cases come to mind? This change doesn't affect what's sent to Archive, only the display on our catalog pages. Would you like that one as a separate issue?

notartom commented 4 months ago

Looks like this went straight to live.

That's on me - figured it's better to ask for forgiveness than permission, in this case.

twinkietoes-on commented 4 months ago

@twinkietoes-on, any other special cases come to mind? This change doesn't affect what's sent to Archive, only the display on our catalog pages. Would you like that one as a separate issue?

So if we "train" people to not use html breaks, now it'll show up correctly on the catalog page but not at Archive? That's still a problem, yes - it's either more work for the MC, or for the lazy MC there will just be a wall of text on Archive.

redrun45 commented 4 months ago

Cast coder has been updated: it separates entries with \n newlines rather than with <br> tags. The old version has just been renamed, since that little project doesn't have git to keep its history.

Guess we may as well keep this issue open for the Archive text.

Edited to add: it looks like this also applies retroactively to a bunch of our old projects. Is there a way this could modify newly input text, but leave the existing entries to display as they did before?

garethsime commented 4 months ago

I don't know much about the Internet Archive uploader, but I guess the main bits are:

Interestingly, it looks like the upload controller currently strips out the newlines, but I guess that's just so that it's consistent with how the catalog was rendering before :slightly_smiling_face: We can probably replace it with the same rendering logic we used for the catalog. (See: _normalize_and_deduplicate_newlines_in_html.)

(This is all assuming that the IA header doesn't reject or mangle the HTML and that the length is fine, which I would guess is probably the case if people are already using it and putting manual <br /> tags in.)

I don't really know how to test the IA stuff is working properly, so I might leave this for someone who knows more about how it hangs together or has access to the API to test with or something.

redrun45 commented 4 months ago

Thank you @garethsime, that was extremely helpful! I've switched the controller to calling your helper instead of trim, and I think that will do it. :grin:

@twinkietoes-on, I've also found the template that includes the other spacing elements. I'm adding another break just before and just after the description. :+1:

I'll see if I can put together a unit test to cover this - I've never done one of those before! And then, I'll see if I can figure out credentials and catalog a few dummy projects to be sure. On a dummy account, if I can. :laughing:

Update: I'm not up to mocking the models I'd need to unit test this, but I've learned a bit about the process. Meanwhile, I have made a new Archive account, and sent a couple of dummies to their test collection. I'll get a PR ready next weekend.

twinkietoes-on commented 4 months ago

Sounds great!

On Sunday, March 3, 2024 at 11:46:26 a.m. EST, redrun45 ***@***.***> wrote:  

Thank you @garethsime, that was extremely helpful! I've switched the controller to calling your helper instead of trim, and I think that will do it. 😁

@twinkietoes-on, I've also found the template that includes the other spacing elements. I'm adding another break just before and just after the description. 👍

I'll see if I can put together a unit test to cover this - I've never done one of those before! And then, I'll see if I can figure out credentials and catalog a few dummy projects to be sure. On a dummy account, if I can. 😆

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>