HTML code tags in Description field

BobHarper1 commented 6 years ago

We have a prospective publisher who is using html code in some of their Description field entries - hazarding a guess that these have been entered originally on a web face. Some of these include tables as well as paragraph spacing.

If this were to be imported into Grantnav the code tags would be displayed in the description on the grant page.

Is there scope for entries with HTML code tags to be rendered accordingly? Possible problems given that any style, images etc. could potentially be included (which would look messy) - but perhaps there's a way to limit to simply tags.

I thought I would ask here first, if this is easy/sensible, rather than asking the publisher to remove the tags from their dataset

morchickit commented 6 years ago

Won't it be a better practice to ask the publisher to remove it so other tools can use it easily as well? GrantNav is not the only application that will have to deal with it. While the standard doesn't imply the best practice on descriptions, I think we should not have the HTML code at all.

On Mon, Feb 19, 2018 at 4:15 PM, Bob Harper notifications@github.com wrote:

We have a prospective publisher who is using html code in some of their Description field entries - hazarding a guess that these have been entered originally on a web face. Some of these include tables as well as paragraph spacing.

If this were to be imported into Grantnav the code tags would be displayed in the description on the grant page.

Is there scope for entries with HTML code tags to be rendered accordingly? Possible problems given that any style, images etc. could potentially be included (which would look messy) - but perhaps there's a way to limit to simply tags.

I thought I would ask here first, if this is easy/sensible, rather than asking the publisher to remove the tags from their dataset

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OpenDataServices/grantnav/issues/448, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHzu1YZE0IMBhcPjGK9zA7ePsDT8JQ6ks5tWZ4jgaJpZM4SKyvq .

BobHarper1 commented 6 years ago

Perhaps, but there might be good reason for use in other applications as well, and it could allow users to decide whether they will use the code or not. HTML code is just text after all, so it does meet the criteria of the Standard (a separate issue perhaps).

morchickit commented 6 years ago

I think this is for now a publisher issue more than a GrantNav issue. I think we should be consistent with publishers description texts as much as we can.

In any case, this is not in the scope of GrantNav for a while, so unless Rob or Ben can provide a quick hack, we won't prioritise it.

On Mon, Feb 19, 2018 at 4:33 PM, Bob Harper notifications@github.com wrote:

Perhaps, but there might be good reason for use in other applications as well, and it could allow users to decide whether they will use the code or not. HTML code is just text after all, so it does meet the criteria of the Standard (a separate issue perhaps).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OpenDataServices/grantnav/issues/448#issuecomment-366744121, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHzuyWcD_8hYmpiuW3soiRVN8tdC_iVks5tWaJAgaJpZM4SKyvq .

Bjwebb commented 6 years ago

I'd avoid trying to do this for now.

The challenges are: 1) Displaying their HTML in a way that doesn't look ugly 2) Making sure publishers can't "break" the rest of the page by closing tags they haven't open etc. 3) Ensuring publishers can't run JavaScript on our page. 4) Not breaking existing descriptions, where publishers use < etc. ie. how do we determine if its HTML or not?

drkane commented 5 years ago

Bleach might be helpful if this is looked at again.

drkane commented 3 years ago

Re-upping this issue as it seems to be occuring in more data (or at least the data it's found in is coming up first in GrantNav).

While I think a strategy of working with publishers to not include html makes sense, I think we should also strip any HTML tags from text before displaying in GrantNav.

Screenshot_2021-03-17 360Giving GrantNav(1)

robredpath commented 3 years ago

@drkane I'm always wary of doing things to data in GrantNav - GrantNav's a preview tool for the entire corpus, so if there's issues with HTML tags in feeds, then anyone else working with the data is going to have that problem - which moves the effort slider a bit more over towards the "anyone using the data" side.

That said, it looks bad in GrantNav, and could damage confidence in the overall corpus and the tool, so if it's proving to be hard to get publishers to remove HTML tags, then I can understand wanting to present a tag-stripped version in the UI.

I've added https://github.com/OpenDataServices/cove/issues/1335 for one potential way to encourage publishers not to include HTML tags in their data, and of course this could be a dimension of the data quality work we've got coming up.

Having a tag-stripped version of the description would be useful to any user, so I'd suggest actually doing this in the datastore. We could then use that for the search results, and maybe use the original on the grants page, potentially with a warning about data quality?

ThreeSixtyGiving / grantnav

HTML code tags in Description field #448