Closed marchellodev closed 2 years ago
Please rebase on top of master
and squash into one or two commit
@marchellodev Can you still work on this? If not, I can rebase and add tests for you
@Skallwar I would really appreciate it! :) I tried to do that a few days ago, but I always stumbled upon some errors. I'm kinda new to git, especially to those sophisticated operations
Thanks again!
Merging #146 (64af90d) into master (1be1f85) will increase coverage by
0.07%
. The diff coverage is52.17%
.
@@ Coverage Diff @@
## master #146 +/- ##
==========================================
+ Coverage 62.54% 62.62% +0.07%
==========================================
Files 16 17 +1
Lines 558 610 +52
==========================================
+ Hits 349 382 +33
- Misses 209 228 +19
Impacted Files | Coverage Δ | |
---|---|---|
src/args.rs | 0.00% <ø> (ø) |
|
src/disk.rs | 0.00% <ø> (ø) |
|
src/scraper.rs | 12.28% <0.00%> (-1.54%) |
:arrow_down: |
src/downloader.rs | 72.89% <100.00%> (ø) |
|
tests/external.rs | 100.00% <100.00%> (ø) |
@CohenArthur are you ok with this?
@marchellodev Thanks again, excellent work
Motivation
A lot of modern websites rely on external domains (usually referred to as cnd domains) for their css, js, images, and other resources. Since SuckIT does not yet support downloading data from external domains (except for the bug when
//en.wikipedia.org
is treated as a relative path, (which I fixed)), it is impossible to properly download big and complex websites (#74).Also, this patch fixes panic when trying to parse urls like
///tools.wmflabs.org/
, which returnsEmpty host
error. I encountered this trying to download wikipedia. So, I think this PR should also close #69Notes
I almost have no experience with Rust, and I haven't yet implemented tests for the changes (I'm not really sure what is the best way to do this). So, please look at the code with extra scrutiny :). However, I have tested it on a few websites, and everything seems to work properly.
Also,
--edepth
(external depth) does not have a shortcut, since-e
is used for excluding pattern. I'm not sure how this parameter should be renamed in order for shortcut to exist