create_db script should report an error if a crawler is not pushing any data

InternetHealthReport / internet-yellow-pages

A knowledge graph for Internet resources

GNU General Public License v3.0

39 stars 16 forks source link

create_db script should report an error if a crawler is not pushing any data #50

Closed romain-fontugne closed 9 months ago

romain-fontugne commented 1 year ago

Describe the bug create_db reports no error if a crawler add no data to the database.

To Reproduce Hard to reproduce but recently the AS relationships dataset from BGPKIT was empty.

Expected behavior I would expect an error message in the create_db log (and email) so that we get informed about that.

romain-fontugne commented 1 year ago

ah, that's what our 'unit_test' methods are doing. Maybe we should call the unit_test methods in create_db?

m-appel commented 1 year ago

The unit_test method is also running the crawler itself, so I assume you mean calling the unit_test method instead of the run method in create_db? That would look a bit weird I think.

How about adding the count_relation calls from the unit test around the run in create_db? Although this basically recreates the unit_test function. Maybe we can simply rename the unit_test function to run_with_sanity (replace the assertion...) and create a new empty unit_test function that just raises an exception for now. The current "unit tests" are not really testing much anyways and we don't call them yet.

romain-fontugne commented 1 year ago

Yes, that code needs some refactoring. The unit_test is not used, we rushed that during the preparation for gsoc...

I guess the most efficient would be either:

to count the number of relationships in the create_db script.
or the run function should return the number of added relationships, and create_db can log an error if that number is 0.

mohamedawnallah commented 1 year ago

I'm interested in working on this particular issue. I'd like to understand Why do we need to report an error if the crawler's data is empty? I'm assuming you're talking specifically about the Fetching phase within the create_db script.

romain-fontugne commented 11 months ago

Sorry I have missed that one. Yes, we want to make sure that at each iteration in the for loop the number of nodes or links in the database is increasing.

mohamedawnallah commented 9 months ago

Addressed in this PR #76