ape-box / crawler

0 stars 0 forks source link

Url scrape not working #1

Open goivemaster opened 7 years ago

goivemaster commented 7 years ago

I ran this from the command line: $ ruby scrape.rb 'http://www.bbc.co.uk'

and received this error:

/home/silver/.rbenv/versions/2.3.1/lib/ruby/2.3.0/uri/rfc3986_parser.rb:67:in `split': bad URI(is not URI?): http://www.bbc.co.uk/bbctrust/ (URI::InvalidURIError)

please can you tell me why this url does not work. Also updating the readme with some instructions on how to use the program might be useful.

John (MWR)

ape-box commented 7 years ago

I'll take a look later on.

Just as a note, as you can see from the code the project is unfinished; some of the classes are structurally ok and just need some more separation of concerns, while others are still in draft state and need a substantial breakdown.

The "rules" also could be implemented better, just not sure how yet.

The project also is missing a fundamental part to handle a proper resource identification...

... I'm still a bit rusty with Ruby, and I lack time to proper dedicate to it.

On 15 Feb 2017 15:52, "goivemaster" notifications@github.com wrote:

I ran this from the command line: $ ruby scrape.rb 'http://www.bbc.co.uk'

and received this error:

/home/silver/.rbenv/versions/2.3.1/lib/ruby/2.3.0/uri/rfc3986_parser.rb:67:in `split': bad URI(is not URI?): http://www.bbc.co.uk/bbctrust/ (URI::InvalidURIError)

please can you tell me why this url does not work. Also updating the readme with some instructions on how to use the program might be useful.

John (MWR)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ape-box/crawler/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZbLniFnUuywDsD2IB7hnQjpaZQFqWhks5rcx8qgaJpZM4MB4pJ .

goivemaster commented 7 years ago

we worked out you need to supply a URI with a file extension to the program. You should probably put this in the ReadMe!