Closed GoogleCodeExporter closed 9 years ago
Original comment by sjdir...@gmail.com
on 18 Jul 2013 at 4:27
Do you have a test site for this?
Original comment by ilushk...@gmail.com
on 19 Jul 2013 at 2:29
I do not, i asked the person that reported this issue for some test urls
and i heard nothing back. I was going to just try it on any zip file i find
on the net and see if it is a universal problem or just with his specific
files. I suspect that we may need to add a CrawledPage.RawBytes property
that is filled by the IPageRequester the same way the
CrawledPage.RawContent is filled. That is only if we determine that
CrawlPage.RawContent doesn't play well with streams for whatever reason.
Original comment by sjdir...@gmail.com
on 19 Jul 2013 at 7:13
Original comment by sjdir...@gmail.com
on 3 Sep 2013 at 1:50
Original comment by sjdir...@gmail.com
on 3 Sep 2013 at 2:52
Added auto encoding and CrawledPage.Content.Bytes which should allow data to be
writtent to file stream without corruption.
Original comment by sjdir...@gmail.com
on 17 Sep 2013 at 2:43
Using v1.2.3.
Like the previous poster, I am attempting to use Abot to pull a series of PDF
files and write them to disk.
When I run fiddler, it shows that the response for the Abot crawl and the
response for downloading the file in the browser have the same content length.
The same number also appears when I check
CrawledPage.HttpResponse.ContentLength.
However, when I test CrawledPage.Content.Bytes for its length, I get a number
about 80% higher (230,818 vs 410,906 for my test item -
http://www.sbcounty.gov/parcelmaps/0130I1.pdf).
Examining the direct download vs the the file written by the Abot-enabled
application, there are small portions of the file look identical, but most
sections are different both in length and data composition.
I've tried running the CrawledPage.Content.Bytes array through the other
encodings available through System.Text.Encodings, but that hasn't brought me
any luck.
Any ideas?
Thank you for your assistance in this matter.
Original comment by joelw...@gmail.com
on 4 Sep 2014 at 3:41
Thank you bringing this back up. Fixed the issue with my last checkin and the
patched version on nuget is 1.2.3.1031. You should now be able to save the raw
bytes to disk using something like...
File.WriteAllBytes("whatever.pdf", crawledPage.Content.Bytes);
Please let me know if you are still having issues.
Original comment by sjdir...@gmail.com
on 4 Sep 2014 at 6:12
Patch works great. Thanks for the quick response!
Original comment by joelw...@gmail.com
on 4 Sep 2014 at 3:55
Original issue reported on code.google.com by
sjdir...@gmail.com
on 18 Jul 2013 at 4:23