Closed adon90 closed 4 years ago
Hi adon90,
Thanks for using uniqurl and giving feedback about it :)
http://grouplogic.com:80/Knowledge/index.cfm?fuseaction=view http://grouplogic.com:80/Knowledge/index.cfm?fuseaction=view&docID=111 The unique content here would be http://grouplogic.com:80/Knowledge/index.cfm?fuseaction=view&docID=111
This script leaves out URLs with duplicate content. In this example, both have different content: "This is not an valid article" & "39790: Illegal Characters on Various Operating Systems". So both URLs are returned by the script.
The line http://grouplogic.com:80/news-events/index.cfm?fa=viewRelease&ID=21&prod=2 gots deleted, I have lost the parameter "prod" in this case because there is not other url containing this parameter in that resource.
The script keeps the shortest URL tht is provided, because is it's more likely that long URLs have useless parameters in the GET request, especially if the URLs come from public resources like waybackurls. It would be very hard if not impossible to check which parameters have a further impact on the usage of the website.
I hope this answers your question? If not, help me understand ;)
Haven't heard from you in a while so I'll be closing this issue. Let me know if you have further questions/issues :)
Hello, let's say I have these two urls:
The unique content here would be http://grouplogic.com:80/Knowledge/index.cfm?fuseaction=view&docID=111 but, for the moment, it keeps them both.
Imagine you run dalfox afterwards or other tool, you don't need http://grouplogic.com:80/Knowledge/index.cfm?fuseaction=view to retest the parameter "fuseaction" again.
Other case would have been:
In that case you would need to keep them both.
Another thing is this:
I got this:
U run cat list.txt | uniqurl
And I got this:
The line
http://grouplogic.com:80/news-events/index.cfm?fa=viewRelease&ID=21&prod=2
gots deleted, I have lost the parameter "prod" in this case because there is not other url containing this parameter in that resource.Regards!