Closed sujee closed 1 week ago
CC : @Qiragg
@Qiragg @hmtbr Let's discuss this in context of other requirements we are aware of. I am not sure we want to do what @sujee is asking for but open to suggestions how we can reconcile various requirements
@touma-I
crawl function should take download directory location
The data-prep-connector is intended to work without any persistent layer. Introducing this would be against the design.
arguments like depth_limit , download_limit should be made available to callback function on_downloaded
I'm not sure why this is required. These static values can be embedded inside the callback function in defining it. Implementing this would just introduce redundancy into our library.
@sujee @Bytes-Explorer I will be closing this issue with no action. We need to decide at some point if we want to expose the crawl function to the notebook users. I don't think it is a good idea for now.
the crawl function is not one we thought to expose to the notebook users. We will revisit this at some future time.
Search before asking
Component
Other
Feature
on_downloaded
. Currentlyon_downloaded
checks these arguments using global variables. They should be from the arguments passed insample code : https://github.com/sujee/data-prep-kit/blob/html-processing-1/examples/notebooks/html-processing/1_download_site.ipynb
Are you willing to submit a PR?