Closed dpancic closed 1 year ago
In GitLab by @laureD19 on Nov 23, 2022, 10:29
Together with Martin, we've finalised the manual curation of 69 items flagged with an URL issue last time we run the notebook.
As we wanted to run it again, to reflect the changes manually introduced in the flags and eventually identified new items with URL problems, we ran into an error with the URLCheck() and the checkURLValues() functions that we are not able to solve on our own.
Here is the screenshot of the error
@cesareconcordia do you think that is something you could look into to help us, please?
In GitLab by @cesareconcordia on Nov 23, 2022, 10:48
I'll look at this in the afternoon, will let you know asap
In GitLab by @cesareconcordia on Nov 30, 2022, 11:12
Unfortunately I cannot reproduce the error... I've added a new control to the CheckURLValues() that should give more information about the error. When you have time:
let me know.
It works partially. After few more tests, it seems that the datasets
category causes the error, but not the other categories. could you also try out on your side for datasets @cesareconcordia ?
I wrote back to the MP for the other item categories. 11 new URL-flags raised to manually curate.
An additional question relates to the un-flagging of manually corrected items. From what you were explaining @cesareconcordia, I thought the setHTTPStatusFlags
method would also unraise the URL-flag of items that have been manually corrected in the meantime, but it doesn't seem the case. Do I need to use another function?
notify also @aureon249
hi @laureD19: the problem you're having checking URL for dataset
items should be fixed now, please get the code from the repository and run the notebook, let me know if it still persists.
I'll check the behaviour of setHTTPStatusFlags
to understand why it does not unraise URL-flags.
Hi, I fetched the code from dev branch and ran the URL validity check (1.1) in the notebook for all five categories on stage. The error mentioned above did not reappear.
Hi @cesareconcordia and @aureon249 !
I've tried out the URL checks and it now works with all categories. Thx!
Regarding the unflagging of items, my 2cts:
setHTTPStatusFlags
then, we need to give another dataset as entry parameter than the df resulting from the URL checks as items to unflagged are not included in this result anymore. Seems to work, but need more tests.removePropertyFlag
before to set the flags again maybe, but I'm not sure I'm using it well. Are there some examples of use somewhere, @cesareconcordia ?
In GitLab by @laureD19 on Nov 23, 2022, 10:24
This is an umbrella issue to discuss URLs curation, especially the methods developed in the python library to flag broken URLs and the examples provided in the related notebook.
notify @KlausIllmayer @cesareconcordia @aureon249 @kreetrapper