Open GoogleCodeExporter opened 8 years ago
Hi Sujay, sorry I missed your comment. For some reason Google Code will email
me when you reply but not when you add a new issue.
This project isn't dead, but it's a low priority at the moment - I would like
to pick up where I left off thought it was quite promising.
Original comment by hippy2094
on 19 Aug 2012 at 7:19
[deleted comment]
Nopes, not a problem. :)
Keep me informed if you put your hand on it in future & make a download
available. A reply in this section would be enough. ;)
Original comment by insights...@gmail.com
on 19 Aug 2012 at 8:55
Well, I couldn't resist ;) An alpha for Windows is available on the Downloads
page. I shouldn't try to crawl half the internet!
Original comment by hippy2094
on 20 Aug 2012 at 2:02
Original comment by hippy2094
on 20 Aug 2012 at 2:02
I can't find any Alpha effect in it. Worked perfectly here. Posted about it in
my forum.
http://insightsintechnologyforum.tk/Thread-Simple-Sitemap-Creator-Alpha?pid=24
Original comment by insights...@gmail.com
on 20 Aug 2012 at 8:02
I can see from your screenshots things aren't 100% correct, those entries
starting "/www.blogger.com" are a bit odd.
I also actually completely forgot to process links that don't have plain text
:D There is also an apparent bug in the tag parsing, your first line is
including the style attribute from the A tag instead of dropping it and placing
the image tag in there.
Original comment by hippy2094
on 21 Aug 2012 at 8:45
You must be right about that. I know very little about all this. All I
understood is that the program is pretty stable but coding may have errors, as
you know better. Something was suspicious for me too, it picked up many more
links from my blog than other sitemap tools do. For your use I have uploaded
the sitemap it created.
http://www.datafilehost.com/download-11654091.html
Original comment by insights...@gmail.com
on 21 Aug 2012 at 2:04
The main issues I see from crawling your site are:
1) It's following https links on different domains
2) It's not stripping anchors from links so considers it a new link
3) It needs to ignore javascript and invalid links.
Thanks for your testing efforts, this has given me something to go on :)
Original comment by hippy2094
on 21 Aug 2012 at 3:05
Great !!
Wish to see an 'improved' version ;)
Original comment by insights...@gmail.com
on 21 Aug 2012 at 3:13
Just a little update (I haven't released a new exe yet), numbers 1 and 2 from
my list appear to be fixed, its just number 3 to go, which is the fun one :/
Original comment by hippy2094
on 30 Aug 2012 at 7:30
Good to know that :-) Take care, don't work too hard ;-)
Original comment by insights...@gmail.com
on 31 Aug 2012 at 7:47
0.1.3 released, should be a major improvement :)
Original comment by hippy2094
on 13 Sep 2012 at 4:12
Sorry for the late response... Will test this today...:)
Original comment by insights...@gmail.com
on 22 Sep 2012 at 8:02
hi, Following are the sitemaps of my site using simple sitemap creator [html
file] and www.xml-sitemaps.com [text file].
Original comment by insights...@gmail.com
on 23 Sep 2012 at 9:20
Attachments:
Wow, I really didn't think it had been this long since I last looked at this
application!
I have updated this app, it now includes a Google Sitemaps compatible XML
output aswell, this seems much more stable than the HTML output which still
needs work.
Original comment by hippy2094
on 26 Jan 2013 at 8:04
Will try it when I get time. :)
Original comment by insights...@gmail.com
on 28 Jan 2013 at 1:09
Actually, ignore the 0.1.4 release. I've completely changed the HTML parsing
routine, which seems a lot better - 0.1.5 will be out in a day or two :)
Original comment by hippy2094
on 29 Jan 2013 at 1:37
0.1.5 is out, hopefully the HTML parsing is much more reliable now.
Original comment by hippy2094
on 30 Jan 2013 at 7:18
Ok, I will test it on the weekend. :)
Original comment by insights...@gmail.com
on 30 Jan 2013 at 7:20
It now works better. :) I found about two unnecessary links there.
Original comment by insights...@gmail.com
on 2 Feb 2013 at 10:37
Attachments:
When I crawl the same site the addthis link that appears in your result doesn't
seem to appear in mine, was that a malformed link that has since been fixed?
The # link is an easy one to fix, source submitted this morning, I'll sort out
a new release after work :)
I'm starting to think we are finally getting passed the alpha stage, I would
like to take this opportunity to thank you for your input - perhaps I can find
room in the About dialog for a special mention :)
Original comment by hippy2094
on 4 Feb 2013 at 7:50
Hi,
I also think that alpha stage is over.
Thanks for your kind comments, but till now I haven't done anything apart from
running a scan and attaching the Log :P
Anyway, in my latest scan of the same website, I too didn't find that addthis
link. But I didn't remove any link.
Today I closely watched the latest log and found a few problems. In the created
sitemap, it is able to gather the title of a link in some cases and in some
cases it fails.
Original comment by insights...@gmail.com
on 4 Feb 2013 at 8:25
Attachments:
I did actually notice this myself. I will look into it, and see what I can find
Original comment by hippy2094
on 4 Feb 2013 at 9:13
Righty, that's fixed, I apologise for the excessive hits I've caused to that
domain, it didn't happen on any of my sites. I will be releasing 0.1.6 later
today :)
Original comment by hippy2094
on 5 Feb 2013 at 11:10
Hi,
That's not a problem. Waiting to test the upcoming version! :)
Original comment by insights...@gmail.com
on 5 Feb 2013 at 11:13
Slight delay, but it's ready for download :)
Original comment by hippy2094
on 6 Feb 2013 at 1:37
Hi,
The sitemap of techoffer.in worked perfect (See Attachment). But while creating
sitemap of insightsintechnology.com it went weird. It started to hunt for
thousands of links and even more when I wanted to terminate the application. I
had to close it from Task Manager. Can you please check?
Original comment by insights...@gmail.com
on 7 Feb 2013 at 6:19
oh my, thats not good :D i'll look into it tomorrow
Original comment by hippy2094
on 7 Feb 2013 at 10:29
426 links retrieved - that sounds a bit better, can you browse this attachment
and tell me if you see anything wrong
Original comment by hippy2094
on 8 Feb 2013 at 11:50
Attachments:
Hi, I can find the following links abnormal.
http://www.insightsintechnology.com//addthis.com/bookmark.php?v=300
file:/C%3A/Program%20Files/Files%20Terminator%20Free/Help/Help_It.html
faq:%20http%3A//www.f-secure.com/en/web/labs_global/removal/easy-clean/faq
And also I am not sure if the following links should be present.
http://www.insightsintechnology.com/search/label/Image%20Managers
http://www.insightsintechnology.com/search/label/Advertisements
http://www.insightsintechnology.com/search/label/PDF
http://www.insightsintechnology.com/search/label/Online%20Service
http://www.insightsintechnology.com/search/label/SoundCloud
http://www.insightsintechnology.com/search/label/File%20Hash
http://www.insightsintechnology.com/search/label/Encrypt
http://www.insightsintechnology.com/search/label/Shred
Original comment by insights...@gmail.com
on 8 Feb 2013 at 4:38
Taken from
http://www.insightsintechnology.com/2013/02/aomei-dynamic-disk-manager-pro-givea
way.html (lines 207/208)
<a class="addthis_button"
href="http://www.insightsintechnology.com//addthis.com/bookmark.php?v=300"
addthis:url='http://www.insightsintechnology.com/2013/02/aomei-dynamic-disk-mana
ger-pro-giveaway.html' addthis:title='AOMEI Dynamic Disk Manager Pro – 10
Licenses Giveaway '><img
src="//cache.addthis.com/cachefly/static/btn/v2/lg-share-en.gif" width="125"
height="16" alt="Bookmark and Share" style="border:0"/></a>
The parser is getting the correct text from the href :)
Why are you unsure about those links being present? They appear to be valid
links.
Original comment by hippy2094
on 8 Feb 2013 at 4:59
http://www.insightsintechnology.com/2012/04/clean-active-malware-infections-with
-f.html#more-191 line 207:
<p>See the product homepage of Easy Clean for more information about it. There
is also a <a
href="faq:%20http%3A//www.f-secure.com/en/web/labs_global/removal/easy-clean/faq
" >FAQ</a> on the software that you might like to view.</p>
I'm still searching for the Program Files one
Original comment by hippy2094
on 8 Feb 2013 at 5:04
http://www.insightsintechnology.com/2012/05/files-terminator-free-can-securely.h
tml#more-155 lines 266/267:
<a
href="file:/C%3A/Program%20Files/Files%20Terminator%20Free/Help/Help_It.html"
target="_blank">Italian language</a>
This one and the previous one should have been filtered for not starting with
the domain, the addthis one is technically a valid link. I'm not sure it's the
job of a sitemap builder to worry about 404s?
Original comment by hippy2094
on 8 Feb 2013 at 5:09
Actually I was unsure because, there should be more such links as it is
indexing some search levels and there are many such levels which it didn't
index. Also Google never indexes those.
Regarding that 404 error link. Why is it getting only this one? There are
dozens of 404's which are converted to 301 with the help of a plugin. :)
Original comment by insights...@gmail.com
on 9 Feb 2013 at 7:07
Ah, it doesn't follow redirects. Is there something in robots.txt that stops
Google following those? Which actually brings up another important question: is
this program a bot that should follow the rules of robots.txt?
Original comment by hippy2094
on 10 Feb 2013 at 9:29
I have checked but no such rules are there in robots.txt :)
Original comment by insights...@gmail.com
on 10 Feb 2013 at 6:40
I guess it's this line in your source
<meta name="robots" content="noindex,follow"/>
"The spider will not look at this page but will crawl through the rest of the
pages on your website." - http://www.metatags.info/meta_name_robots
This line doesn't seem to appear on non-search pages.
Original comment by hippy2094
on 11 Feb 2013 at 9:26
Ok, I will have to check it :)
Original comment by insights...@gmail.com
on 11 Feb 2013 at 10:28
Okey dokey :) Still presents the question of whether or not it should be
following the rules of robots.txt, I'm thinking yes.
Original comment by hippy2094
on 11 Feb 2013 at 10:43
New version released :)
Original comment by hippy2094
on 13 Feb 2013 at 6:27
Will test it on Weekends :D
Original comment by insights...@gmail.com
on 13 Feb 2013 at 6:34
Just so you know, there is now a wiki page of known issues :)
http://code.google.com/p/simplesitemapcreator/wiki/currentissues
Original comment by hippy2094
on 17 Feb 2013 at 3:16
Hi,
Sorry for the late response. Just ran a quick test. Few problems with my site
techoffer.in
If you see carefully the attachment, it is not able to get the name
Page 1 of ----
in all the cases.
No problem with other cases "page 2 of", "page 3 of" etc.
Original comment by insights...@gmail.com
on 21 Feb 2013 at 6:55
Attachments:
Very strange, I will look into it. In the meantime, I have another new project
I just released for Windows and OSX that you might be interested in:
http://code.google.com/p/page2png/ let me know what you think :)
Original comment by hippy2094
on 21 Feb 2013 at 9:03
Hi, Thanks for that.
I have posted an issue in your new project. Nice one!
Original comment by insights...@gmail.com
on 21 Feb 2013 at 9:34
Original issue reported on code.google.com by
insights...@gmail.com
on 3 Aug 2012 at 2:32