Search broken due to newline in news

CheariX commented 1 month ago

Have you checked that your issue isn't already filed?

[X] I read through FAQ and searched through the past issues, none of which addressed my issue.
[X] Yes, I have checked that this issue isn't already filed.

Bug description

See my details in #2450

How to reproduce the bug

Add markdown link to an inline news article

Edit: Initially, I thought this issue is caused to due markdown. But I was wrong, it occurs if the content has more than one line.

Error messages and logs

No response

What operating system are you using?

Not applicable (e.g. you're using GitHub Pages or other hosting)

Where are you seeing the problem on?

Running locally with Docker

More info

I think my report in a closed PR was overlooked. Therefore I created this issue.

george-gca commented 1 month ago

I believe I fixed this in #2450. Also there isn't a comment from you there.

CheariX commented 1 month ago

The following changes introduced in #2450 breaks search on my setup. ctrl-k is completly empty:

https://github.com/alshedivat/al-folio/blob/d004837e607bf14e593f245211b91f13264c4ac1/_includes/scripts/search.liquid#L90-L94

I used a markdown link like Link to [site](https://example.com) in my inline news, which causes this issue. Instead, if a use HTML like Link to <a href="https://example.com">site</a> the search is working, but the HTML is escaped, which uglifies the search entry.

My suggestion would be to simple stay with the previous code. People can manually add site.title if they want to have a nicer looking title.

Another (untested) solution could be to remove all HTML, using strip_html. However I still think using site.title would be better since we don't know the length of the inline news. I could possible contain multiple paragraphs.

george-gca commented 1 month ago

I don't know if I get this. I tried the following changes:

And had this as a result:

Both in the search and in the page the text is fine. Are you sure you have the latest updates?

CheariX commented 1 month ago

Okay, this error was a bit hard to reproduce. It was not the HTML Link, but a newline in my news. I had an inline news using two lines.

The Browser's Error console quickly displayed the error.

Uncaught SyntaxError: '' string literal contains an unescaped line break localhost:8080:1171:238

When I removed the second line of my news, search worked again.

Additionally, I'm not very happy with the currenty mardown representation. For example, this news entry:

---
layout: post
date: 2024-06-01 10:00:00-0400
inline: true
related_posts: false
---

This news contains a [Link](https://example.com)

Produces:

Update: On upstream/master, I get a rendered Link. I don't know why, but anyways, i think it should not be rendered. The link is not intended to be clicked, and it is also hard to read.

I suggest to use strip_html | strip_newlines | escape for all three cases here (not only for the description).

https://github.com/alshedivat/al-folio/blob/afc56cc9877d08506e83c6568fe6ae8f2867d508/_includes/scripts/search.liquid#L90-L97

Also, I do not like the dependence on the inline attribute since this attribute is specific to the news collection, so I'd remove the inline-check from this code. My suggestion would be to check if page.title exists and, in that case, prefer to use page.title. Otherwise, use {%- assign title = item.content | strip_html | strip_newlines | escape -%}. Since page.title is used at multiple places, this approach seems more general to me.

What do you think?

george-gca commented 1 month ago

Update: On upstream/master, I get a rendered Link. I don't know why, but anyways, i think it should not be rendered. The link is not intended to be clicked, and it is also hard to read.

I agree that it is hard to read, but I am not sure if it should not be clicked. Think about this use case, what should happen if the user search for the news, and it is inline with a link? Should it only be read, or should be possible to click in it? But I don't think we should take the link and expand to the whole line either, because maybe the news has more than one link in it. I'd like to have more opinions on that matter. What do you think @pourmand1376, @alshedivat?

Also, I do not like the dependence on the inline attribute since this attribute is specific to the news collection, so I'd remove the inline-check from this code.

Idk, because if for some reason someone forgets to set page.title it would render a possibly very big content inside the search, while if someone set both the page.title and inline the page.title would simply be ignored. I think the error that it generates with inline is "less extreme" than by using the title like you said.

CheariX commented 1 month ago

Update: On upstream/master, I get a rendered Link. I don't know why, but anyways, i think it should not be rendered. The link is not intended to be clicked, and it is also hard to read.

I agree that it is hard to read, but I am not sure if it should not be clicked. Think about this use case, what should happen if the user search for the news, and it is inline with a link? Should it only be read, or should be possible to click in it? But I don't think we should take the link and expand to the whole line either, because maybe the news has more than one link in it. I'd like to have more opinions on that matter. What do you think @pourmand1376?

In my humble opinion, the ideal behavior would be to visit the news page. If it is an inline news, all other inline news should become invisible through the use of CSS or JavaScript. Similarly, I imagine searching for BibTeX and papers.

Also, I do not like the dependence on the inline attribute since this attribute is specific to the news collection, so I'd remove the inline-check from this code.

Idk, because if for some reason someone forgets to set page.title it would render a possibly very big content inside the search, while if someone set both the page.title and inline the page.title would simply be ignored. I think the error that it generates with inline is "less extreme" than by using the title like you said.

Well, if someone forgets it, would that be a user error? We could also apply something like truncatewords: 50. I just found newline_to_br. Use it for displaying longer entries like this:

If page.title and inline are both set, page.title should take precedence; or at least, this is what I would expect.

george-gca commented 3 weeks ago

Ok I will take a closer look at this.

sim642 commented 1 week ago

I suggest to use strip_html | strip_newlines | escape for all three cases here (not only for the description).

I use markdown links in inline news and also changed to these filters to not have HTML source being shown as search entries.

george-gca commented 6 days ago

Just tested your suggestion @CheariX, and it won't work. If the title is not defined, it is generated automatically from the filename apparently. For example, for the 1st news it generated Announcement_1 even though the title is not defined in the front matter. So I will keep the inline solution.

I will add the strip_html | strip_newlines | escape to the title, it makes sense. Also I will add truncatewords: 10 for all the titles.

alshedivat / al-folio