NCIOCPL / cgov-digital-platform

The Cancer.gov Digital Communications Platform
GNU General Public License v2.0
11 stars 33 forks source link

Some RSS items have empty length enclosures. #3907

Closed bryanpizzillo closed 1 year ago

bryanpizzillo commented 1 year ago

Issue description

An error appeared in the logs unable to get the file size of an image that is being used for an enclosure for an RSS feed item. The enclosure item gets a length="" which is not correct. ADDITIONALLY, the mime type is incorrect when the length is empty, it should be the graphics type and nottype="application/octet-stream". The root cause is that there is no good, and supported way, to do this in Drupal, see https://www.drupal.org/project/drupal/issues/3150318.

Error message:

web-3323 Jun 7 03:36:57 web-3323 ncigov01live[20449]: https://www.cancer.gov/|1686109017|php|104.103.70.14|https://www.cancer.gov/PublishedContent/RSS/syndication/rss/ncinewsreleases.rss%7C%7C0%7C%7CWarning: filesize(): stat failed for /mnt/www/html/ncigov01live/docroot/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2022-09/Healthy%20lifestyle_0.jpg in Drupal\cgov_core\CgovCoreTwigExtensions->getImageData() (line 209 of /mnt/www/html/ncigov01live/docroot/profiles/custom/cgov_site/modules/custom/cgov_core/src/CgovCoreTwigExtensions.php) #0 /mnt/www/html/ncigov01live/docroot/core/includes/bootstrap.inc(347): _drupal_error_handler_real(2, ‘filesize(): sta...‘, ‘/mnt/www/html/n...’, 209)

Feed output for https://www.cancer.gov/PublishedContent/RSS/syndication/rss/ncinewsreleases.rss:

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" xml:base="https://www.cancer.gov/">
  <channel>
    <title>NCI News Releases</title>
    <link>https://www.cancer.gov/</link>
    <description>The latest cancer news from the U.S. government's principal agency for cancer research.</description>
    <language>en</language>

    <item>
  <title>NCI’s ComboMATCH initiative will test new drug combinations guided by tumor biology</title>
  <link>https://www.cancer.gov/news-events/press-releases/2023/combomatch-precision-medicine-cancer-initiative</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2023/combomatch-precision-medicine-cancer-initiative</guid>
  <pubDate>Thu, 01 Jun 2023 12:00:00 +0000</pubDate>
  <description>ComboMATCH will consist of numerous phase 2 cancer treatment trials that aim to identify promising drug combinations that can advance to larger, more definitive clinical trials.</description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2023-05/Precision%20medicine%20image.jpg" length="" type="application/octet-stream"/>
</item>
<item>
  <title>NCI study outlines opportunities to achieve President Biden’s Cancer Moonshot goal of reducing cancer death rates in the United States</title>
  <link>https://www.cancer.gov/news-events/press-releases/2023/opportunities-to-reduce-cancer-death-rate</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2023/opportunities-to-reduce-cancer-death-rate</guid>
  <pubDate>Mon, 17 Apr 2023 12:00:00 +0000</pubDate>
  <description>A new study has outlined opportunities for achieving President Biden and First Lady Biden’s Cancer Moonshot goal of reducing the cancer death rate by at least 50% over the next 25 years.</description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2023-04/DCEG%20Media%20Release_Factoid-Final.jpg" length="" type="application/octet-stream"/>
</item>
<item>
  <title>Pragmatica-Lung Study, a streamlined model for future cancer clinical trials, begins enrolling patients</title>
  <link>https://www.cancer.gov/news-events/press-releases/2023/pragmatica-lung-study-begins-enrollment</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2023/pragmatica-lung-study-begins-enrollment</guid>
  <pubDate>Wed, 12 Apr 2023 12:00:00 +0000</pubDate>
  <description>NCI has helped launch the Pragmatica-Lung Study, a phase 3 randomized clinical trial of a two-drug combination to treat patients with advanced non-small cell lung cancer. The simplified trial design aims to increase accessibility for participants.</description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2023-04/Pragmatica-Lung%20image.jpg" length="" type="application/octet-stream"/>
</item>
<item>
  <title>NCI study finds that immunotherapy substantially increases survival of people with lymphomatoid granulomatosis</title>
  <link>https://www.cancer.gov/news-events/press-releases/2023/immunotherapy-lymphomatoid-granulomatosis</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2023/immunotherapy-lymphomatoid-granulomatosis</guid>
  <pubDate>Tue, 04 Apr 2023 12:00:00 +0000</pubDate>
  <description>An NCI study shows that people with low-grade lymphomatoid granulomatosis who are treated with interferon alfa-2b, a type of immunotherapy, can live for decades after diagnosis.</description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2023-04/LYG%20scans.jpg" length="" type="application/octet-stream"/>
</item>
<item>
  <title>Cancer Grand Challenges announces global research funding opportunity with nine new challenges</title>
  <link>https://www.cancer.gov/news-events/press-releases/2023/cancer-grand-challenges-new-funding-opportunity</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2023/cancer-grand-challenges-new-funding-opportunity</guid>
  <pubDate>Wed, 08 Mar 2023 12:00:00 +0000</pubDate>
  <description>As part of the Cancer Grand Challenges program, NCI and Cancer Research UK have announced nine new research challenges to tackle profound problems in cancer research.</description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2022-06/CGC%20logo.jpg" length="" type="application/octet-stream"/>
</item>
<item>
  <title>NCI clinical trial leads to atezolizumab approval for advanced alveolar soft part sarcoma</title>
  <link>https://www.cancer.gov/news-events/press-releases/2022/nci-trial-atezolizumab-approval-alveolar-soft-part-sarcoma</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2022/nci-trial-atezolizumab-approval-alveolar-soft-part-sarcoma</guid>
  <pubDate>Wed, 28 Dec 2022 12:00:00 +0000</pubDate>
  <description>A clinical trial led by NCI has resulted in FDA approval of the immunotherapy drug atezolizumab (Tecentriq) to treat advanced alveolar soft part sarcoma.</description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2022-12/atezolizumab-blocking-PD-L1.png" length="42039" type="image/png"/>
</item>
<item>
  <title>Statement from Monica M. Bertagnolli, M.D., Director, National Cancer Institute, National Institutes of Health</title>
  <link>https://www.cancer.gov/news-events/press-releases/2022/nci-director-monica-bertagnolli-breast-cancer-diagnosis</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2022/nci-director-monica-bertagnolli-breast-cancer-diagnosis</guid>
  <pubDate>Wed, 14 Dec 2022 12:00:00 +0000</pubDate>
  <description>A statement from the National Cancer Institute by NCI Director Monica M. Bertagnolli, M.D., about her recent diagnosis with early breast cancer.  </description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2022-11/%203%20NCI-Director-Monica-Bertagnolli-official-headshot-November-2022.png" length="" type="application/octet-stream"/>
</item>
<item>
  <title>Annual Report to the Nation: Cancer deaths continue downward trend; modest improvements in survival for pancreatic cancer</title>
  <link>https://www.cancer.gov/news-events/press-releases/2022/annual-report-to-the-nation-2022</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2022/annual-report-to-the-nation-2022</guid>
  <pubDate>Thu, 27 Oct 2022 12:00:00 +0000</pubDate>
  <description>Overall cancer death rates continued to fall among men, women, children, and adolescents and young adults in every major racial and ethnic group in the United States from 2015 to 2019, according to the latest Annual Report to the Nation on the Status of Cancer.</description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/thumbnail/2022-10/ARN-thumbnail-override.png" length="22911" type="image/png"/>
</item>
<item>
  <title>Monica Bertagnolli begins work as 16th director of the National Cancer Institute</title>
  <link>https://www.cancer.gov/news-events/press-releases/2022/bertagnolli-nci-director</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2022/bertagnolli-nci-director</guid>
  <pubDate>Mon, 03 Oct 2022 12:00:00 +0000</pubDate>
  <description>Monica M. Bertagnolli, M.D., begins her tenure as the 16th director of the National Cancer Institute on October 3, 2022. She previously served as the Richard E. Wilson Professor of Surgery in the field of surgical oncology at Harvard Medical School, a surgeon at Brigham and Women’s Hospital, and a member of the Gastrointestinal Cancer Treatment and Sarcoma Centers at Dana-Farber Cancer Institute.</description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2022-11/%203%20NCI-Director-Monica-Bertagnolli-official-headshot-November-2022.png" length="" type="application/octet-stream"/>
</item>
<item>
  <title>A healthy lifestyle may help former smokers lower their risk of death from all causes</title>
  <link>https://www.cancer.gov/news-events/press-releases/2022/former-smokers-healthy-lifestyle-risk-of-death</link>
  <guid isPermaLink="true">https://www.cancer.gov/news-events/press-releases/2022/former-smokers-healthy-lifestyle-risk-of-death</guid>
  <pubDate>Thu, 22 Sep 2022 12:00:00 +0000</pubDate>
  <description>A new study finds that former smokers who stick to a healthy lifestyle have a lower risk of dying from all causes, including cancer and heart and lung disease, than those who don’t have healthy habits.  </description>
  <enclosure url="https://www.cancer.gov/sites/g/files/xnrzdm211/files/styles/cgov_thumbnail/public/cgov_image/media_image/2022-09/Healthy%20lifestyle_0.jpg" length="" type="application/octet-stream"/>
</item>

  </channel>
</rss>
blairlearn commented 1 year ago

This error appears to be happening because CgovCoreTwigExtensions::getImageData() is receiving a URL encoded file name and failing to decode it before attempting to get information from the file system. (e.g. Healthy%20lifestyle_0.jpg vs Healthy lifestyle_0.jpg)

The error can be reproduced locally under that circumstance.

  1. In docroot/profiles/custom/cgov_site/modules/custom/cgov_yaml_content/content/300_press_release.content.yml, at line 185, change the filename for "The BRCA Exchange graphic" from BRCA-exchange-article.jpg to BRCA exchange article.jpg
  2. In the container, run blt cgov:reinstall --no-interaction.
  3. When it's complete, go to http://www.devbox/news-events and note that the image still appears.
  4. Browse to http://www.devbox/PublishedContent/RSS/syndication/rss/ncinewsreleases.rss
  5. The enclosure element for the "BRCA Exchange aggregates data on thousands of BRCA variants to inform understanding of cancer risk" entry will have:
    • An empty length attribute
    • A type attribute of "application/octet-stream"
  6. Log in to http://www.devbox/user/login
  7. Browser to the watchdog log http://www.devbox/admin/reports/dblog
  8. Note the warning messages:
    • "Warning: filesize(): stat failed for /var/www/docroot/sites/default/files/styles/cgov_thumbnail/public/cgov_image/media_image/2023-06/BRCA%20exchange%20article.jpg"
    • Warning: exif_imagetype(/var/www/docroot/sites/default/files/styles/cgov_thumbnail/public/cgov_image/media_image/2023-06/BRCA%20exchange%20article.jpg): Failed to open stream: No such file or directory in Drupal\cgov_core\CgovCoreTwigExtensions->getImageData()
  9. In the container, cd to /var/www/docroot/sites/default/files/styles/cgov_thumbnail/public/cgov_image/media_image/2023-06/ and execute the command cp 'BRCA exchange article.jpg' BRCA%20exchange%20article.jpg
  10. Run drush cr
  11. Reload http://www.devbox/PublishedContent/RSS/syndication/rss/ncinewsreleases.rss
  12. Note that enclosure element now has:
    • A length attribute of 17221
    • A type attribute of "image/jpeg"
bryanpizzillo commented 1 year ago

The place where this is dying is