alshedivat / al-folio

A beautiful, simple, clean, and responsive Jekyll theme for academics
https://alshedivat.github.io/al-folio/
MIT License
11.38k stars 11.29k forks source link

External blogposts from substack appear locally but throws an error upon deployment #2116

Closed aqcheng closed 6 months ago

aqcheng commented 10 months ago

Acknowledge the following

Describe the bug I have a website that has been working fine. I recently decided to move my blog to substack and use the external RSS feed instead, but I am getting the error

/home/runner/work/aqcheng.github.io/aqcheng.github.io/vendor/bundle/ruby/3.0.0/gems/feedjira-3.2.2/lib/feedjira.rb:63:in `parse': No valid parser for XML. (Feedjira::NoParserAvailable)

and the website fails to deploy. It works again when I remove the external feed from the _config.yml file. The website displays the blogs fine locally. Full error message is attached. deploy errmsg.txt

To Reproduce Steps to reproduce the behavior:

In _config.yml, add an external RSS feed i.e.

external_sources:
   - name: substack.com
     rss_url: https://aqcheng.substack.com/feed

Expected behavior Github deployment mimics local deployment.

System (please complete the following information):

alshedivat commented 10 months ago

I tried adding your external source to config and didn't run into any issues building it locally from the master branch. my only guess is maybe you just need to update ruby to 3.2.2? (both locally and in your github runner config)

aqcheng commented 10 months ago

How do I update ruby in my github runner config? Also, I didn't run into any issues building it locally; it's the deployment from github that is problematic

alshedivat commented 10 months ago

are you using github actions for automatic deployment? if that's the case, you just need to update the version in .github/workflows/deploy.yml config: https://github.com/alshedivat/al-folio/blob/1d84621f220e157d0701d5cbd69f334cb6730a4c/.github/workflows/deploy.yml#L21-L25

aqcheng commented 10 months ago

I tried both 3.1.4 (the version I have locally) and 3.2.2, but with no luck. Same error message. I'm positive the problem is with feedjira because it deploys fine when I don't include the external feed, but the ruby, feedjira, and jekyll versions on the Github pages deployment are all identical to my local ruby installation.

I am using bundle exec jekyll serve to deploy the website locally (i.e. not using docker), and I believe my website was cloned from v0.6.0. I've customized my website pretty extensively so I think it'd be hard to update it. Do you have any idea what the issue could be?

george-gca commented 10 months ago

Before anything try to replicate the GitHub website locally. Do exactly what is done in the deploy action, more specifically:

export JEKYLL_ENV=production
bundle exec jekyll build --lsi

The --lsi is just if you use the related posts part. Then serve the site locally with python3 -m http.server -d _site/ and test it in http://localhost:8000/. When you build the site with JEKYLL_ENV=production it does some changes on the build process.


One thing that might be causing the issue is older libraries. For instance in #2073 the site of some users were throwing some errors due to deprecated old libraries. Maybe if you update them it could work. Also as I suggested there you could get the tip of master branch and the tip of yours and run a diff between the two directories with a tool like meld or winmerge. They make the process of manual merge a little easier.

stale[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

harmonbhasin commented 6 months ago

Had a similar issue with trying to include my substack. After printing out the variable xml, it seems that I'm not even getting the rss feed, but rather am getting the html for a page that says 'Just a moment...', which I assume is CloudFlare blocking Github.

I found that others had a similar issue, https://github.com/Athou/commafeed/issues/1138. I'm looking into solutions, but I figured I'd leave a comment.

harmonbhasin commented 6 months ago

Came up with a solution, it's kind of rough, in external-posts.rb, I added the following line after reading in the xml file,

if xml.include?('<!DOCTYPE html>') xml = File.read('./substack.rss') end

where 'substack.rss' is generated by calling the following bash command curl https://your.substack.com/feed > substack.rss. This won't update every time you post to substack since it's relying on the file, but it's a good enough temporary fix.

george-gca commented 6 months ago

What if you somehow set the user agent used to fetch this information? Like it is done in here?