-
The actual content (mostly of scientific articles) is useful to include as part of the metadata to facilitate LLM and full text search
When harvesting a metadata record, identify which links are re…
-
Sometimes we encounter such a situation: we suddenly see an interesting article and want to read it after work, but when we have time to read it after work, the article has already been deleted. If th…
-
Instead of downloading the full Wikipedia dump, extracting it, then running a ragel script over the XML file, can we just do it all in memory? Pseudocode: `curl -s http://dumps.wikimedia.org/.../enwik…
-
I'm trying to extract keywords and keyphrases from around 20k abstracts of journal articles. The FAQ mentions that it is recommended to use GPU with KeyBERT. However, I'm unclear how exactly to run th…
-
As Alexsejrs pointed out, the organizer is too manual and not convenient. If you manually put your top priority articles at the top of the list (which is already annoying if you've got a long list), t…
ghost updated
6 years ago
-
Json Path seems to require all paths to start with '$': https://goessner.net/articles/JsonPath/
However, json_extract and json_extract_scalar functions treat leading $ as optional.
Looks like th…
-
PR https://github.com/mlcommons/training/pull/435 contains a script, `cleanup_scripts/separate_test_set.py` that is used to randomly extract articles from the training set for use as an evaluation set…
-
A few months ago I made a start at extracting content from EPUB files. See the 'extract-epub.js' file in the root of the project.
EPUB files are basically zip files of HTML files, so it's fairly st…
-
Hey Tanmay, cool project!
Atm the data is from GPT-4o, https://github.com/sarkartanmay393/GeoPulse/blob/11d926a97eace592b28d8489f9014a6ab0536cfd/src/app/api/generate/route.ts#L46-L78, that's cool -…
-
``` sh
git-extract [directory]
```
Extract a directory from the current branch into its own repository and keep the history.
If no directory is given, the current one will be used.
Explained in thi…
adius updated
9 years ago