bramus / mixed-content-scan

Scan your HTTPS-enabled website for Mixed Content
MIT License
522 stars 51 forks source link

cURL Error (3): <url> malformed #34

Closed djcechi closed 8 years ago

djcechi commented 8 years ago

Hi guys,

I can use the script with --input parameter only, otherwise it does not work (parameter error). If I use the --input parameter I get following error below. What's the problem? Source file is plaintext, one domain per line, no quotation marks etc. You can see that script can get the domain name correctly as it is in source file.

[2016-03-23 13:58:37] MCS.NOTICE: Scanning * [] [] [2016-03-23 13:58:37] MCS.INFO: 00000 - https://www.domena.cz [] [] [2016-03-23 13:58:37] MCS.CRITICAL: cURL Error (3): malformed [] [] [2016-03-23 13:58:37] MCS.INFO: 00001 - [] [] [2016-03-23 13:58:37] MCS.NOTICE: Scanned 2 pages for Mixed Content [] []

bramus commented 8 years ago

Works fine over here, both with or without trailing /:

$ mixed-content-scan https://www.domena.cz
[2016-04-19 11:54:13] MCS.NOTICE: Scanning https://www.domena.cz/ [] []
[2016-04-19 11:54:15] MCS.INFO: 00000 - https://www.domena.cz/ [] []
[2016-04-19 11:54:15] MCS.NOTICE: Scanned 1 pages for Mixed Content [] []
$ mixed-content-scan https://www.domena.cz/
[2016-04-19 11:54:22] MCS.NOTICE: Scanning https://www.domena.cz/ [] []
[2016-04-19 11:54:23] MCS.INFO: 00000 - https://www.domena.cz/ [] []
[2016-04-19 11:54:23] MCS.NOTICE: Scanned 1 pages for Mixed Content [] []

(Note: the yielded result of 1 scanned page is correct here, as the HTML only contains scripts that need to be evaluated — mixed-content-scan scans the outputted HTML structure, it does not evaluate it)

Feel free to reopen this issue if the problem persists.

djcechi commented 8 years ago

Dear bramus, thanks for reply. I am facing such problem when using a list of domains on input. Scanning of one page given in parameter works without a problem.

Example (one domain is sourcefile): ./mixed-content-scan --input=/home/user/mixed-scan/zadani.txt --output=/home/user/mixed-scan/vysledek --format=ansi --no-check-certificate

[2016-04-25 09:15:15] MCS.NOTICE: Scanning * [] [] [2016-04-25 09:15:15] MCS.INFO: 00000 - https://www.sslmarket.cz [] [] [2016-04-25 09:15:15] MCS.CRITICAL: cURL Error (3): malformed [] [] [2016-04-25 09:15:15] MCS.INFO: 00001 - [] [] [2016-04-25 09:15:15] MCS.NOTICE: Scanned 2 pages for Mixed Content [] [] [2016-04-25 09:15:44] MCS.NOTICE: Scanning * [] [] [2016-04-25 09:15:45] MCS.INFO: 00000 - www.sslmarket.cz [] [] [2016-04-25 09:15:45] MCS.CRITICAL: cURL Error (3): malformed [] [] [2016-04-25 09:15:45] MCS.INFO: 00001 - [] [] [2016-04-25 09:15:45] MCS.NOTICE: Scanned 2 pages for Mixed Content [] []

bramus commented 8 years ago

Can you post the contents of the input file?

djcechi commented 8 years ago

I assume script needs one domain on one line.

Input file is simple plaintext: https://www.sslmarket.cz https://www.sslmarket.sk https://www.sslmarket.hu

I tried use domains on separate lines, but it does not work for one domain too. Thank you.

bramus commented 8 years ago

Is it possible that your input file contains empty lines?

Here's my output when input.txt contains an empty line at the end (notice the output “Scanned 4 pages for Mixed Content” instead of it having scanned 3 pages):

bramus in ~
$ mixed-content-scan --input=input.txt --no-check-certificate
[2016-04-27 18:43:52] MCS.NOTICE: Scanning * [] []
[2016-04-27 18:43:53] MCS.INFO: 00000 - https://www.sslmarket.cz [] []
[2016-04-27 18:43:54] MCS.INFO: 00001 - https://www.sslmarket.sk [] []
[2016-04-27 18:43:55] MCS.INFO: 00002 - https://www.sslmarket.hu [] []
[2016-04-27 18:43:55] MCS.CRITICAL: cURL Error (3): <url> malformed [] []
[2016-04-27 18:43:55] MCS.INFO: 00003 -  [] []
[2016-04-27 18:43:55] MCS.NOTICE: Scanned 4 pages for Mixed Content [] []
bramus in ~
$ 

And here's without an empty line at the end (no extra MCS.INFO, and correct number of pages outputted at the end):

bramus in ~
$ mixed-content-scan --input=input.txt --no-check-certificate
[2016-04-27 18:44:25] MCS.NOTICE: Scanning * [] []
[2016-04-27 18:44:27] MCS.INFO: 00000 - https://www.sslmarket.cz [] []
[2016-04-27 18:44:28] MCS.INFO: 00001 - https://www.sslmarket.sk [] []
[2016-04-27 18:44:28] MCS.INFO: 00002 - https://www.sslmarket.hu [] []
[2016-04-27 18:44:28] MCS.NOTICE: Scanned 3 pages for Mixed Content [] []
bramus in ~
$ 
djcechi commented 8 years ago

You are right, there were 4 lines. Every editor starts a new line, nano with -L parametr did the trick. Thank you for this idea.

Is possible to crawl all links from URLs in input file? I need to serve domains (homepage) in input file and make your script to crawl them (similar to using the script with just one domain witnout input file).

Thanks in advance