department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
97 stars 69 forks source link

/files/ appearing on FE of published stories #16376

Closed maortiz-27-80 closed 9 months ago

maortiz-27-80 commented 9 months ago

Describe the defect

There have been at least two (2) instances reported from the Help Desk where editors have published a Story and on the front end, /files/, appears randomly. /files/ does not show in the Preview for editors. See filename: files_front_end_bug.png

To Reproduce

Steps to reproduce the behavior:

  1. Go to 'https://www.va.gov/salt-lake-city-health-care/programs/telehealth-services/'
  2. Scroll down to 'VA Video Connect (VVC).'
  3. Click on the '+' to expand
  4. See error image.png

AC / Expected behavior

When publishing a story, the text, /files/, should not appear on the front-end.

Additional context

Notes per Steve Wirt:

Team

Please check the team(s) that will do this work.

files_front_end_bug.png

cc: @BerniXiongA6 @ndouglas @EWashb @TroyA6 @stefaniefgray

TroyA6 commented 9 months ago

@xiongjaneg @BerniXiongA6 @stefaniefgray @maortiz-27-80 - There was a work-around at one point: totally deleting the text in the accordion, then recreating that text from scratch. I tried that in this case, and it did not work.

swirtSJW commented 9 months ago

My hunch, but unproven at this time: In both examples so far, there has been a link to a file on another drupal system. A different domain but still matching the traditiional drupal file paths. There is code on the FE that rewrites paths in the VA cms file system to make them match where files are moved to the content build.
The code is here I suspect the code is somehow misfiring on drupal paths that are not VA CMS and somehow we are ending up with an extraneous `/files/' hanging being injected in the wrong location for the wrong reason.

ndouglas commented 9 months ago

I have a different proposed culprit:

In assets.js we have this:

Screenshot 2023-12-12 at 8 29 18 AM

I think this is the root cause: data.match(/^.*\/sites\/.*\/files\//).

That will match _literally any string that contains /sites/ and later contains /files/, then pass it off to convertAssetPath().

For instance:

Screenshot 2023-12-12 at 8 36 11 AM

convertAssetPath() looks like this:

Screenshot 2023-12-12 at 8 31 57 AM

It will first attempt to replace the beginning of the text (if it begins with http etc) with nothing, then return that. It won't find it in the example above. Then it'll check to see if the string begins with http... still doesn't. Then it splits by a question mark, and if that ends with a file extension, returns a different path -- which it normally won't.

Otherwise, it returns /files/${path}, and path may not be defined here, so we'd see /files/.

So, I think the solution is to un@#$% that first regex.

ndouglas commented 9 months ago

I think #^https?://([a-z0-9]+(-[a-z0-9]+)*\.)+cms\.va\.gov/sites/[\w-]+/files/# would work (similar to the regex in convertAssetPath(), but setting a different delimiter so we can remove the annoying backslashes, and replacing a scary greedy wildcard... just for the sake of readability here). That should match anything that looks like a cms.va.gov URL at the beginning but allow normal text to pass through without mutilation.

ndouglas commented 9 months ago

Apparently it was tighter, but the change was reverted: https://github.com/department-of-veterans-affairs/content-build/pull/332 so I'm going to loosen up my pattern a bit.

ndouglas commented 9 months ago

Screenshot 2023-12-12 at 9 15 42 AM

One problem, I think, is that if we do something like this:

Here is a <a href="https://prod.cms.va.gov/sites/default/files/pdf.pdf">link to a PDF</a>.

Then I think we expect that inner link to be replaced. Or perhaps that's being replaced in that code you mentioned, @swirtSJW , which would mean that this here Should Just Work™?

ndouglas commented 9 months ago

Screenshot 2023-12-12 at 9 20 54 AM

Made the regex easier to read. It doesn't really matter if it's a valid domain name... that's not really our problem.

ndouglas commented 9 months ago

Relevant discussions:

https://dsva.slack.com/archives/C0MQ281DJ/p1625685915353300

https://dsva.slack.com/archives/C0MQ281DJ/p1625694464385700

It looks like a change was reverted, but then the underlying issue was never actually resolved. We don't want to reference other Drupals' files with relative paths, which is why I was confused. So I'm going to re-tighten my regex.

ndouglas commented 9 months ago

Screenshot 2023-12-12 at 9 32 40 AM

TroyA6 commented 9 months ago

If this helps @maortiz-27-80 and @ndouglas @stefaniefgray @BerniXiongA6 - The following link breaks the Accordion functionality: https://connectedcare.va.gov/sites/default/files/2023-01/OT_va-telehealth-va-video-connect-feature-fact-sheet.pdf

There is a different link to a different PDF that seems to work: https://www.va.gov/files/2023-10/VVC%20Appointments_SLC_Brochure_DIGITAL508_Reader%20Extended_JD_20230928.pdf

The difference that I see is the location of the file (connectedcare.va.gov vs. www.va.gov/files) Both links are working links outside of the accordion --