coderdojo-japan / coderdojo.jp

☯️ CoderDojo Japan (@coderdojo-japan) official website developed by Ruby on Rails with @YassLab team. 💎
https://coderdojo.jp
Other
61 stars 102 forks source link

Podcast 用の RSS パース時にデプロイエラーが起こる #1620

Closed yasulab closed 2 months ago

yasulab commented 2 months ago

やること

背景

RSS のサイズが大きすぎる...? cc/ @nanophate

RSS::NotWellFormedError: This is not well formed XML entity expansion has grown too large


image

A deployment for coderdojo-japan failed due to a release phase command in release v3238. To inspect the failure, check your release phase log in the dashboard or run 'heroku releases:output v3238' in the CLI.

If you wish to retry the release, you can use the release retry CLI plugin.

image

(要ログイン) https://dashboard.heroku.com/apps/coderdojo-japan/activity/releases/3238

rails aborted!
RSS::NotWellFormedError: This is not well formed XML
entity expansion has grown too large
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/rexmlparser.rb:20:in `rescue in _parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/rexmlparser.rb:16:in `_parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/parser.rb:183:in `parse'
/app/vendor/ruby-3.1.4/lib/ruby/3.1.0/forwardable.rb:238:in `parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/parser.rb:88:in `parse'
/app/lib/tasks/podcasts.rake:16:in `block (2 levels) in <main>'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `block in execute'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `execute'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:219:in `block in invoke_with_call_chain'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `synchronize'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `invoke_with_call_chain'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:188:in `invoke'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:160:in `invoke_task'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block (2 levels) in top_level'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block in top_level'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:125:in `run_with_threads'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:110:in `top_level'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:24:in `block (2 levels) in perform'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:186:in `standard_exception_handling'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:24:in `block in perform'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/rake_module.rb:59:in `with_application'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:18:in `perform'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/command.rb:50:in `invoke'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands.rb:18:in `<main>'
/app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
Caused by:
entity expansion has grown too large
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/parsers/baseparser.rb:559:in `block in unnormalize'
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/parsers/baseparser.rb:551:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/parsers/baseparser.rb:551:in `unnormalize'
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/parsers/streamparser.rb:39:in `parse'
/app/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.3/lib/rexml/document.rb:402:in `parse_stream'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/rexmlparser.rb:18:in `_parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/parser.rb:183:in `parse'
/app/vendor/ruby-3.1.4/lib/ruby/3.1.0/forwardable.rb:238:in `parse'
/app/vendor/bundle/ruby/3.1.0/gems/rss-0.2.9/lib/rss/parser.rb:88:in `parse'
/app/lib/tasks/podcasts.rake:16:in `block (2 levels) in <main>'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `block in execute'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `execute'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:219:in `block in invoke_with_call_chain'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `synchronize'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `invoke_with_call_chain'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/task.rb:188:in `invoke'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:160:in `invoke_task'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block (2 levels) in top_level'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `each'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block in top_level'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:125:in `run_with_threads'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:110:in `top_level'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:24:in `block (2 levels) in perform'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/application.rb:186:in `standard_exception_handling'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:24:in `block in perform'
/app/vendor/bundle/ruby/3.1.0/gems/rake-13.0.6/lib/rake/rake_module.rb:59:in `with_application'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands/rake/rake_command.rb:18:in `perform'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/command.rb:50:in `invoke'
/app/vendor/bundle/ruby/3.1.0/gems/railties-6.1.7.8/lib/rails/commands.rb:18:in `<main>'
/app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.16.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
Tasks: TOP => podcasts:upsert
(See full trace by running task with --trace)
yasulab commented 2 months ago

1619 のコミット (a530169b) が原因っぽいので一旦 revert しました :eyes: ✅ (RSS は動的に設定できないので 急ぎじゃないよ で大丈夫そう)

nanophate commented 2 months ago

原因

原因を調べるために、rexml の変更点とエラーメッセージを探したところ、下記のような処理が追加されていました。 https://github.com/ruby/rexml/compare/v3.3.2...v3.3.4#diff-f8c7cdefc29090ed525a2be70411ce741d4124853cf6425db7d18a6ea3bb9bb3R558-R561

if sum > Security.entity_expansion_text_limit
   raise "entity expansion has grown too large"
end

この変更で Security.entity_expansion_text_limit を読み込むようになり、デフォルトでは 10240 に固定されたため、エラーが起きるようになったようです。ちなみに取得先の RSS では、1,800,000 近くまで行くので、上限を超えています。

回避方法

REXML::Security.entity_expansion_text_limit = 2_000_000 のように、Limit を固定することで問題なく実行できるようになる事を確認しました。このやり方だと適用範囲が大きそうなので、https://github.com/ruby/rexml/issues/192 の影響範囲を小さく設定できる機能のマージを待ってから対応してもよさそうと思っています。

.oO(私たちのコード部分では、RSS Parser を使っているので、RSS Parser 側での対応が必要なのか気になりますね… 🤔💭 )

REXML::Security.entity_expansion_text_limit = 2_000_000
FM_RSS = "https://example.com/rss"
rss = RSS::Parser.parse(FM_RSS, false)

https://github.com/coderdojo-japan/coderdojo.jp/blob/a530169b8ae38a68009adf79486921db1270943c/lib/tasks/podcasts.rake#L13-L16

yasulab commented 2 months ago

@nanophate 早速の原因調査ありがとうございます!! 😻🆒✨ 現在のシステム構成では特にセキュリティ上問題になるような動的な RSS 入力はないという認識なので、僕も以下の対応が良いと思います! (≧∇≦)b✨

このやり方だと適用範囲が大きそうなので、ruby/rexml#192 の影響範囲を小さく設定できる機能のマージを待ってから対応してもよさそう

naitoh commented 2 months ago

@yasulab @nanophate Security.entity_expansion_text_limit の計算方法に誤りがあったので https://github.com/ruby/rexml/pull/195 で修正しました。

https://github.com/ruby/rexml/releases/tag/v3.3.5 で修正済みですので、rexml 3.3.5 を試してみて頂ければ 🙏

nanophate commented 2 months ago

@naitoh 親切にお知らせいただきありがとうございます。先ほど、下記の PR で rexml 3.3.5 にアップデートした状態で問題なく、動く事および、デプロイ時にエラーにならない事の確認ができました...!! 💯 🚀 ✨数字も1,800,000 から 4,098 とページに対して適切なサイズになってました!対応いただき改めて感謝申し上げます🙇

https://github.com/coderdojo-japan/coderdojo.jp/pull/1622/files#diff-89cade48462044ee1b672dc5f4c3ec250fbd29effcd8932096a23c1283c6731fR365

Screenshot 2024-08-17 at 16 31 43
❯ bundle exec rails podcasts:upsert
==== START podcasts:upsert ====

Frame number: 0/42

From: /Users/vivio/.code/coderdojo.jp/vendor/bundle/ruby/3.1.0/gems/rexml-3.3.5/lib/rexml/parsers/baseparser.rb:559 REXML::Parsers::BaseParser#unnormalize:

    554:               if entity_value
    555:                 re = Private::DEFAULT_ENTITIES_PATTERNS[entity_reference] || /&#{entity_reference};/
    556:                 rv.gsub!( re, entity_value )
    557:                 binding.pry
    558:                 
 => 559:                 if rv.bytesize > Security.entity_expansion_text_limit
    560:                   raise "entity expansion has grown too large"
    561:                 end
    562:               else
    563:                 er = DEFAULT_ENTITIES[entity_reference]
    564:                 rv.gsub!( er[0], er[2] ) if er

[1] pry(#<REXML::Parsers::BaseParser>)> rv.bytesize
=> 4098
yasulab commented 2 months ago

産地直送...!!!! 🚜💨✨ ご丁寧なコメント&アドバイスありがとうございます!!!!(>人< )💖