buren / wayback_archiver

Ruby gem to send URLs to Wayback Machine
https://rubygems.org/gems/wayback_archiver
MIT License
57 stars 11 forks source link

cannot load such file -- rexml/document #44

Closed shoeper closed 2 years ago

shoeper commented 3 years ago

Trying to archive https://wiki.algo.informatik.tu-darmstadt.de/ I get the error. I'm not into ruby, but to me it looks like the code depends on rexml and it is missing somewhere. I could be wrong, though.

<internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require': cannot load such file -- rexml/document (LoadError)                        
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/sitemap.rb:1:in `<top (required)>'
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/sitemapper.rb:4:in `<top (required)>'                                      
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:4:in `<top (required)>'                                   
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:4:in `<top (required)>'
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/share/rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:4:in `<top (required)>'
        from /usr/local/bin/wayback_archiver:23:in `load'                      
        from /usr/local/bin/wayback_archiver:23:in `<main>'        
shoeper commented 3 years ago

Also ran into this issue (after running gem install rexml with a crawl job):

Traceback (most recent call last):                                                                                                                           
        39: from /usr/local/bin/wayback_archiver:23:in `<main>'                                                                                              
        38: from /usr/local/bin/wayback_archiver:23:in `load'                                                                                                
        37: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:86:in `<top (required)>'                                             
        36: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:86:in `each'                                                         
        35: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:87:in `block in <top (required)>'                                    
        34: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:61:in `archive'                                                   
        33: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:111:in `crawl'                                                    
        32: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:84:in `crawl'                                             
        31: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:47:in `crawl'                                       
        30: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/spidr.rb:54:in `site'                                                                      
        29: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:274:in `site'                                                                     
        28: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:355:in `start_at'                                                                 
        27: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:373:in `run'                                                                      
        26: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:655:in `visit_page'                                                               
        25: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:589:in `get_page'                                                                 
        24: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:776:in `prepare_request'                                                          
        23: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:595:in `block in get_page'                                                        
        22: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `block in visit_page'                                                      
        21: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `each'                                                                     
        20: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `block (2 levels) in visit_page'                                           
        19: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:52:in `block (2 levels) in crawl'                  
        18: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:85:in `block in crawl'                                    
        17: from /usr/local/share/gems/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/executor/immediate_executor.rb:29:in `post'                
        16: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:86:in `block (2 levels) in crawl'                         
        15: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:105:in `post_url'                                        
        14: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/adapters/wayback_machine.rb:17:in `call'                             
        13: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:89:in `get'                                              
        12: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:201:in `perform_request'                                  
        11: from /usr/share/ruby/net/http.rb:1483:in `request'
        10: from /usr/share/ruby/net/http.rb:933:in `start'                                                                                                  
         9: from /usr/share/ruby/net/http.rb:1485:in `block in request'
         8: from /usr/share/ruby/net/http.rb:1492:in `request'                                                                                               
         7: from /usr/share/ruby/net/http.rb:1519:in `transport_request'
         6: from /usr/share/ruby/net/http.rb:1519:in `catch'                                                                                                 
         5: from /usr/share/ruby/net/http.rb:1528:in `block in transport_request'                                                                           
         4: from /usr/share/ruby/net/http/response.rb:31:in `read_new'                                                                                       
         3: from /usr/share/ruby/net/http/response.rb:42:in `read_status_line'
         2: from /usr/share/ruby/net/protocol.rb:201:in `readline'                                                                                           
         1: from /usr/share/ruby/net/protocol.rb:191:in `readuntil'
/usr/share/ruby/net/protocol.rb:217:in `rbuf_fill': Net::ReadTimeout with #<TCPSocket:(closed)> (Net::ReadTimeout)
        29: from /usr/local/bin/wayback_archiver:23:in `<main>'
        28: from /usr/local/bin/wayback_archiver:23:in `load'
        27: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:86:in `<top (required)>'
        26: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:86:in `each'
        25: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:87:in `block in <top (required)>'
        24: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:61:in `archive'
        23: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:111:in `crawl'
        22: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:84:in `crawl'
        21: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:47:in `crawl'
        20: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/spidr.rb:54:in `site'
        19: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:274:in `site'
        18: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:355:in `start_at'
        17: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:373:in `run'
        16: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:655:in `visit_page'
        15: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:589:in `get_page'
        14: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:776:in `prepare_request'
        13: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:595:in `block in get_page'
        12: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `block in visit_page'
        11: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `each'
        10: from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `block (2 levels) in visit_page'
         9: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:52:in `block (2 levels) in crawl'
         8: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:85:in `block in crawl'
         7: from /usr/local/share/gems/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/executor/immediate_executor.rb:29:in `post'
         6: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:86:in `block (2 levels) in crawl'
         5: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:105:in `post_url'
         4: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/adapters/wayback_machine.rb:17:in `call'
         3: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:89:in `get'
         2: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:199:in `perform_request'
         1: from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:204:in `rescue in perform_request'
/usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:204:in `fetch': key not found: Net::ReadTimeout (KeyError)
Did you mean?  Net::HTTPBadResponse
shoeper commented 3 years ago

And another (similar, but line numbers don't equal) one

usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:204:in `fetch': key not found: Net::ReadTimeout (KeyError)
Did you mean?  Net::HTTPBadResponse                                                                
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:204:in `rescue in perform_request'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:199:in `perform_request'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:89:in `get'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/adapters/wayback_machine.rb:17:in `call'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:105:in `post_url'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:86:in `block (2 levels) in crawl'
        from /usr/local/share/gems/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/executor/immediate_executor.rb:29:in `post'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:85:in `block in crawl'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:52:in `block (2 levels) in crawl'                                                                            
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `block (2 levels) in visit_page'                                                                                                    
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `each'                 
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `block in visit_page'   
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:595:in `block in get_page'                                                                                                                 
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:776:in `prepare_request' 
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:589:in `get_page'                                                                                                                          
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:655:in `visit_page'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:373:in `run'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:355:in `start_at'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:274:in `site'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/spidr.rb:54:in `site'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:47:in `crawl'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:84:in `crawl'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:111:in `crawl'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:61:in `archive'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:87:in `block in <top (required)>'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:86:in `each'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:86:in `<top (required)>'
        from /usr/local/bin/wayback_archiver:23:in `load'
        from /usr/local/bin/wayback_archiver:23:in `<main>'
/usr/share/ruby/net/protocol.rb:219:in `rbuf_fill': Net::ReadTimeout with #<TCPSocket:(closed)> (Net::ReadTimeout)                                                                                                
        from /usr/share/ruby/net/protocol.rb:193:in `readuntil'
        from /usr/share/ruby/net/protocol.rb:203:in `readline'
        from /usr/share/ruby/net/http/response.rb:42:in `read_status_line'
        from /usr/share/ruby/net/http/response.rb:31:in `read_new'
        from /usr/share/ruby/net/http.rb:1557:in `block in transport_request'
        from /usr/share/ruby/net/http.rb:1548:in `catch'
        from /usr/share/ruby/net/http.rb:1548:in `transport_request'
        from /usr/share/ruby/net/http.rb:1521:in `request'
        from /usr/share/ruby/net/http.rb:1514:in `block in request'
        from /usr/share/ruby/net/http.rb:960:in `start'
        from /usr/share/ruby/net/http.rb:1512:in `request'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:201:in `perform_request'                                                                                           
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/request.rb:89:in `get'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/adapters/wayback_machine.rb:17:in `call'                                                                                      
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:105:in `post_url'                                                                                                  
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:86:in `block (2 levels) in crawl'                                                                                  
        from /usr/local/share/gems/gems/concurrent-ruby-1.1.8/lib/concurrent-ruby/concurrent/executor/immediate_executor.rb:29:in `post'                                                                          
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:85:in `block in crawl'                                                                                             
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:52:in `block (2 levels) in crawl'                                                                            
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `block (2 levels) in visit_page'                                                                                                    
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `each'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:659:in `block in visit_page'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:595:in `block in get_page'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:776:in `prepare_request'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:589:in `get_page'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:655:in `visit_page'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:373:in `run'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:355:in `start_at'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/agent.rb:274:in `site'
        from /usr/local/share/gems/gems/spidr-0.6.1/lib/spidr/spidr.rb:54:in `site'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/url_collector.rb:47:in `crawl'                                                                                                
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver/archive.rb:84:in `crawl'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:111:in `crawl'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/lib/wayback_archiver.rb:61:in `archive'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:87:in `block in <top (required)>'                                                                                             
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:86:in `each'
        from /usr/local/share/gems/gems/wayback_archiver-1.4.0/bin/wayback_archiver:86:in `<top (required)>'
        from /usr/local/bin/wayback_archiver:23:in `load'
        from /usr/local/bin/wayback_archiver:23:in `<main>'