-
### Requested Feature
Make the sync command, maybe with a flag or by default, scrape the data of songs that have already been downloaded in the syncing directory for download data, making the creatio…
-
https://www.churchofjesuschrist.org/study/general-conference/2020/10/45andersen.p5?lang=eng#p5
This paragraph is shortened in the data to just the stuff before the quote.
Same problem with this:
htt…
-
- The code change
![image](https://user-images.githubusercontent.com/30778938/173219121-a87a5e9a-bbd9-4c09-a98c-9051911ba959.png)
- The output
![image](https://user-images.githubusercontent.com/3…
-
Espn gives bad data sometimes:
http://scores.espn.go.com/ncf/playbyplay?gameId=400547677
Games such as this sometimes flood duplicate and out of order data.
We should implement a check to verify t…
-
Pretty simple thing to do.
-
Hello,
Per the readme downloads are processed a month at a time.
Is there an estimate of the average size of data scraped in these chunks? As well as an estimate of the final total size of the …
dnola updated
3 years ago
-
Love this library! Would you be open to adding an integration to Indexfiy(https://getindexify.ai) as a destination for the scraped data?
Developers would be able to build complex pipelines on the …
-
Hello,
I have a flowise workflow to web scrape our entire web (150+ pages) and then save it to Pinecone. We are currently using Cheerio Web scrapper node. (it could be Puppeteer, Playwright - it does…
-
Loading a .xml file on an RG353V console for a system appears to take time to load but does not result in any observable change after it is done. Logs include an error message:
```
java.net.Malform…
-
What is the estimated size of the dataset?
If the dataset is not yet published, how much progress have you made? Will you be releasing an interim dataset?