feedreader / pluto

pluto gems - planet feed reader and (static) website generator - auto-build web pages from published web feeds
Creative Commons Zero v1.0 Universal
192 stars 14 forks source link

'error: This is not well formed XML' is fatal #21

Closed mgorny closed 4 years ago

mgorny commented 4 years ago

I'm sorry for reporting so many issues in one day but I suppose this might also be worth including in the upcoming release.

$ cat hex.ini 
title = Planet Gentoo (test)

[hexxeh]
title = Liam McLoughlin
link = http://hexxeh.net/?cat=5&feed=
feed = http://hexxeh.net/?cat=5&feed=rss2
$ ./bin/pluto b hex.ini 
activerecord-utils/0.4.0 (activerecord/6.0.2) on Ruby 2.7.0 (2019-12-25) [x86_64-linux]
activityutils/0.1.1 on Ruby 2.7.0 (2019-12-25) [x86_64-linux]
pluto/1.3.2 on Ruby 2.7.0 (2019-12-25) [x86_64-linux]
[info] db settings:
[info] {:adapter=>"sqlite3", :database=>"./hex.db"}
-- create_table(:logs)
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/sqlite3/schema_statements.rb:91: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:260: warning: The called method `initialize' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:305: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:201: warning: The called method `primary_key' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:202: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:363: warning: The called method `column' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:378: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:429: warning: The called method `new_column_definition' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:229: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:363: warning: The called method `column' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:229: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:363: warning: The called method `column' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:411: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:363: warning: The called method `column' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:412: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:363: warning: The called method `column' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_creation.rb:17: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:1099: warning: The called method `type_to_sql' is defined here
   -> 0.0163s
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/internal_metadata.rb:41: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:227: warning: The called method `string' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/transactions.rb:212: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/database_statements.rb:274: warning: The called method `transaction' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/persistence.rb:503: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/timestamp.rb:127: warning: The called method `create_or_update' is defined here
-- create_table(:props)
   -> 0.0112s
-- create_table(:activities)
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:229: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:363: warning: The called method `column' is defined here
   -> 0.0127s
-- create_table(:sites)
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:229: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:363: warning: The called method `column' is defined here
   -> 0.0117s
-- create_table(:subscriptions)
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:424: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:120: warning: The called method `initialize' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:142: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:363: warning: The called method `column' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:786: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:1172: warning: The called method `add_index_options' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:1199: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:1262: warning: The called method `quoted_columns_for_index' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:1266: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:1254: warning: The called method `add_options_for_index_columns' is defined here
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:1256: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:1237: warning: The called method `add_index_sort_order' is defined here
   -> 0.0347s
-- create_table(:feeds)
   -> 0.0149s
-- create_table(:items)
   -> 0.0244s
/home/mgorny/.gem/ruby/2.7.0/gems/activemodel-6.0.2.1/lib/active_model/type/integer.rb:13: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activemodel-6.0.2.1/lib/active_model/type/value.rb:8: warning: The called method `initialize' is defined here
[info] Updating feed subscription >hexxeh< - >http://hexxeh.net/?cat=5&feed=rss2<...
[info] Updating feed subscription >zx2c4< - >http://blog.zx2c4.com/planetgentoo<...
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/relation/delegation.rb:115: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/relation.rb:27: warning: The called method `initialize' is defined here
[info] found cache entry for >http://hexxeh.net/?cat=5&feed=rss2<
[info] OK - fetching feed 'hexxeh' - HTTP status 200 OK
/home/mgorny/.gem/ruby/2.7.0/gems/activerecord-6.0.2.1/lib/active_record/attribute_methods/dirty.rb:102: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/home/mgorny/.gem/ruby/2.7.0/gems/activemodel-6.0.2.1/lib/active_model/attribute_mutation_tracker.rb:45: warning: The called method `changed?' is defined here
[info] Before parsing feed >hexxeh<...

*** error: This is not well formed XML
Invalid attribute name: <;r++)console.log("actionQueue",c(t[r]))}function n(){clearTimeout(w);for(var e,t=0;e=h[t];t++)document["on"+e]=null}function a(e){if(!e.target)return!1;var t=e.target,r=(t.tagName||"").toLowerCase();if(e.metaKey)return!1;if(e.shiftKey&&"a"==r)return!1;if(t.hostname&&!t.hostname.match(g))return!1;if(e.type.match(p)&&s(t))return!1;if("label"==r){var n=t.getAttribute("for");if(n){var a=document.getElementById(n);if(a&&f(a))return!1}else for(var i,o=0;i=t.childNodes[o];o++)if(f(i))return!1}return!0}function i(e,t){t.bucket=e,b[e].push(t)}function o(e){var t={};for(var r in e)t[r]=e[r];return t}function u(e){for(;e&&e!=document.body;){if("A"==e.tagName)return e;e=e.parentNode}}function c(e){var t=[];e.bucket&&t.push("["+e.bucket+"]"),t.push(e.type);var r,n,a=e.target,i=u(a),o="",c=e.timestamp&&e.timestamp-d;return"click"===e.type&&i?(r=i.className.trim().replace(/\s+/g,"."),n=i.id.trim(),o=/[^#]$/.test(i.href)?" ("+i.href+")":"",a='"'+i.innerText.replace(/\n+/g," ").trim()+'"'):(r=a.className.trim().replace(/\s+/g,"."),n=a.id.trim(),a=a.tagName.toLowerCase(),e.keyCode&&(a=String.fromCharCode(e.keyCode)+" : "+a)),t.push(a+o+(n&&"#"+n)+(!n&&r?"."+r:"")),c&&t.push(c),t.join(" ")}function f(e){var t=(e.tagName||"").toLowerCase();return"input"==t&&"checkbox"==e.getAttribute("type")}function s(e){var t=(e.tagName||"").toLowerCase();return"textarea"==t||"input"==t&&"text"==e.getAttribute("type")||"true"==e.getAttribute("contenteditable")}for(var m,d=(new Date).getTime(),l=1e4,g=/^([^\.]+\.)*twitter\.com$/,p=/^key/,h=["click","keydown","keypress","keyup"],v=[],w=null,y=!0,b={captured:[],ignored:[],direct:[],all:[]},k=0;m=h[k];k++)document["on"+m]=e;w=setTimeout(function(){y=!1},l),window.swiftActionQueue={buckets:b,flush:t,logActions:r,wasFlushed:!1}}();
  </script>
Line: 25
Position: 3359
Last 80 unconsumed characters:

Here the issue is that the URL is outdated and redirects into non-feed that's not valid XML. While here it's our fault for not cleaning up old entries, I think normally this can happen when someone changes the blog without telling us to update the URL and I think it'd be better to handle it gracefully as well.

geraldb commented 4 years ago

No worries. The more the better. Please report all. Reading the new issue I think there currently is a "fallback" feed format if the parser cannot detect the format (e.g. rss, atom, json, etc.) so maybe I remove the "fallback" with a proper error (instead of letting the format parser crash).

geraldb commented 4 years ago

Please, update the feedparser and the pluto gems. This reports / logs now an error in the latest version. Something like:

[error] *** error: unknown feed format (is XML or JSON?) for 'hexxeh' - 
                             http://hexxeh.net/?cat=5&feed=rss2 
                             starting with: <!DOCTYPE html> <html...