brave / sugarcoat-pipeline

CLI that implements the SugarCoat pipeline
Mozilla Public License 2.0
7 stars 3 forks source link

Invalid URL error when running sugarcoat pipeline on any public url #23

Open AndreiCBogdan opened 2 years ago

AndreiCBogdan commented 2 years ago

I have managed to install the sugarcoat tool and download the paragraph binary successfully but I'm having issues with actually using the tool on any url. I am using the command in the readme: npm run sugarcoat-pipeline -- -b <PATH_TO_PAGEGRAPH_BINARY> -u <URL> -t <SECS_TO_RUN_PAGEGRAPH> -l <FILTERLISTS>

This command opens the given url in brave without any issues but the tool later fails with an 'Invalid URL' message on the terminal, weirdly I don't think any error is thrown.

I have tried to search for 'Invalid URL' throughout the code and the only place I could find it is jammed in a pagegraph graphml file which looks impossible to decipher.

Device: MacbookPro Intel OS: macOS Monteray

Any guidance or ideas are much appreciated.

AndreiCBogdan commented 2 years ago

The initial hypothesis around the error was that the PageGraph crawler was not working, however, I have separately generated the GraphML files and parsed them into the sugarcoat pipeline using the -g flag and still receiving the error.