-
# A Sanity Check on ‘Emergent Properties’ in Large Language Models - Hacking semantics
One of the often-repeated claims about LLMs is that they have ‘emergent properties’. Unfortunately, in most case…
-
I'd like to explore adding property-based tests with fuzzing tools like http://jwilk.net/software/python-afl, https://hypothesis.readthedocs.org/en/latest/, and https://bitbucket.org/ebo/pyfuzz. Here…
-
https://guides.github.com/features/mastering-markdown/List of languages (written or programming) used in NaNoGenMo 2015.
See [2014's incomplete survey](https://github.com/dariusk/NaNoGenMo-2014/iss…
-
First of all let me take the opportunity to thank you for this very useful corpus of trade agreements.
I think the text of the FTA between Chile and the US (pta_53) is incomplete. In this reposito…
-
### Feature description
This issue lists the default metadata that would be nice to have :
- [ ] original file path
- [ ] filename
- [ ] complete file extension (for instance, a file named `xyz…
-
Do you provide the scripts/code that you developed to match the PDFMiner outputs on the documents to the XML representation of the PDF page itself? Thanks
-
The Common Voice project uses CC0 sentences from various sources. There's a dump of them on GitHub: https://github.com/common-voice/common-voice/tree/main/server/data
Unfortunately, most of the `se…
Sobsz updated
2 years ago
-
**Name:**
Enoch Antwi & James Stewart
**Theme:**
Policy and Judicial Reform
**Brief description of your idea:**
A mobile and analytics platform available to social justice groups and the general pub…
-
`textacy` currently has one, small example corpus — the "Bernie and Hillary" corpus containing 3000 speeches and basic metadata from the Congressional Record — and readers for two, very large corpora …
-
I understand how difficult it is to split sentences that contain abbreviations and that adding abbreviations can have pitfalls, as it is nicely explained in #2154. However, I have stumbled upon some c…