-
While studying the database, I see that there is not a single occurrence of non-BMP characters in it. Was it a consequence of the method used and, if so, would it be possible to ascertain the presence…
-
Which programming languages should be included in the pretraining dataset?
In contrast to natural language does multilinguality not increase the vocab size significantly so I think it could be intere…
-
- [ ] require that we be affirmatively enabled -- just having the plugin in the pip path isn't enough
- [ ] db is configurable
- [ ] support running inside of multiple DBs (...does this make sense? …
-
This one is puzzling me. So, in my logs for crawling http://nsidc.org/, I have a bunch of non-identical json-ld objects, which are getting the same hash generated for them. I poked around and figure o…
-
I have a problem with the way google crawler renders my Nuxt SPA website. It just renders the loading indicator and ignores the content.
I don't want to disable the loading animation, but can't fin…
-
Hello, great tool, one problem though, sitemaps are limited to 50k entries per page.
-
I don't see why there is no field for entering keywords per page.
How come?
-
Requests are filtered before being added to the request storage, so as to discard irrelevant pages.
When crawling large sites, some filtering rules may be added after crawling is started. It usuall…
-
https://bugzilla.mozilla.org/show_bug.cgi?id=1332273
Mostly setting this up to make sure the bugzilla bug is tracked. This could/should be useful for us, especially for `@document-start` support? …
-
### What version of `astro` are you using?
1.0.0-beta.63
### Are you using an SSR adapter? If so, which one?
None
### What package manager are you using?
pnpm
### What operating syst…