Closed sujee closed 2 weeks ago
CC : @Qiragg
@Qiragg @hmtbr is this really a bug or is it that @sujee is trying to use it in a way that is not intended for?
@touma-I It's not a bug of the connector library. The data-prep-connector has no access to storage including local directories. If the description is true, you have to fix your notebook.
@touma-I It is not a bug, the connector is working as intended.
We push the responsibility of managing the processing and storage of the acquired content to the user. The user has to design the logic and include any error handling, if any - as @sujee encountered in his example.
In a separate issue (https://github.com/IBM/data-prep-kit/issues/777) that @sujee raised, I added an example of how to catch errors that happen during the user-defined callback function which is the case that we are dealing with. I designed the original example so I felt it necessary to also add the error-handling so the user can notice the errors in the callback function. The errors are not arising from the core-connector logic.
Happy to chat further if there's some unanswered questions of confusion.
Closing this issue as the code is working as expected and a new transform is being developed to handle most of the I/O.
Search before asking
Component
Other
What happened + What you expected to happen
When downloading files, if download dir is not present the crawl silently fails.
Recommendations:
crawl
functionReproduction script
https://github.com/sujee/data-prep-kit/blob/html-processing-1/examples/notebooks/html-processing/1_download_site.ipynb
Anything else
data_prep_connector 0.2.2
OS
Ubuntu
Python
3.11.x
Are you willing to submit a PR?