digininja / CeWL

CeWL is a Custom Word List Generator
2.01k stars 262 forks source link

using -d 0, I get no entries #32

Closed stahnirockt closed 4 years ago

stahnirockt commented 6 years ago

When using the preinstalled cewl (version 5.3) on Kali, I can use -d 0 to get only results from the webpage I want. Cloning and using version 5.4.2 from GitHub I didn't get entries with -d 0, only with -d 1 but then I haven't the results of the wanted page only the "subpages".

digininja commented 6 years ago

Are you saying 5.4.2 isn't getting words off the front page?

On Sun, 4 Mar 2018, 16:42 stahnirockt, notifications@github.com wrote:

When using the preinstalled cewl (version 5.3) on Kali, I can use -d 0 to get only results from the webpage I want. Cloning and using version 5.4.2 from GitHub I didn't get entries with -d 0, only with -d 1 but then I haven't the results of the wanted page only the "subpages".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/digininja/CeWL/issues/32, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHJWWZ6Lt519JgIxLPZCVRbqdmMd9sQks5tbBldgaJpZM4SbZmr .

stahnirockt commented 6 years ago

Yes, that's what I wanted to say. I've tried a bit more and the problem does not occur on all websites. For example, https://github.com gives the same result in both versions. But https://en.wikipedia.org/wiki/Computer does not provide frontpage results in version 5.4.2, but it does in version 5.3. Same result with every wikipedia entry. Any idea, what could be the problem. I was trying this on mac and linux, same results.

digininja commented 6 years ago

I've not got a Kali box to try it on but I'll make sure that the depth feature works as expected on the Github master.

Will probably be the next couple of days before I can look at it though. If I don't get back to you by the end of the week give me a nudge.

On Sun, 4 Mar 2018 at 20:02 stahnirockt notifications@github.com wrote:

Yes, that's what I wanted to say. I've tried a bit more and the problem does not occur on all websites. For example, https://github.com gives the same result in both versions. But https://en.wikipedia.org/wiki/Computer does not provide frontpage results in version 5.4.2, but it does in version 5.3. Same result with every wikipedia entry. Any idea, what could be the problem. I was trying this on mac and linux, same results.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/digininja/CeWL/issues/32#issuecomment-370259015, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHJWeXSmJTXSv1shYOaIuzv8Vl6qfQaks5tbEhggaJpZM4SbZmr .

stahnirockt commented 6 years ago

Today, I've tried a little further. If I create a website with a link to a wikipedia entry and append '-d 1' and '-o', I get the results of the wanted page.

Also, it seems that commenting out line 706 solved the problem for me.

# The spider doesn't work properly if there isn't a / on the end
if url !~ /\/$/
#   url = "#{url}/"
end

It was also commented out in version 5.3.

digininja commented 6 years ago

That change went in because there was another issue raised that it was stopping spidering working with it there. I'll have to do some proper digging, my guess is it has something to do with sites that do automatic redirection from URLs without trailing slashes to with a slash.

On Tue, 6 Mar 2018 at 09:39 stahnirockt notifications@github.com wrote:

Today, I've tried a little further. If I create a website with a link to a wikipedia entry and append '-d 1' and '-o', I get the results of the wanted page.

Also, it seems that commenting out line 706 solved the problem for me.

The spider doesn't work properly if there isn't a / on the end

if url !~ /\/$/

url = "#{url}/"

end

It was also commented out in version 5.3.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/digininja/CeWL/issues/32#issuecomment-370721446, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHJWSstxTeAXDQq-JFB8gyvcrWgg7Glks5tblk9gaJpZM4SbZmr .

digininja commented 4 years ago

Assume it is all working now, not had any recent complaints.