-
Serious optimizations required during parsing of MarkupElement.
See: CsQuery HTML standard test (5mb file)
https://raw.githubusercontent.com/jamietre/CsQuery/master/source/CsQuery.Tests/Resources/HT…
-
I'm not sure whether this is a pydantic 1 -> 2 issue, or whether the underlying llama.cpp just doesn't support input for embedding in numerical tokenised form. I should note at this point that I'm ver…
-
- [ ] allow to pass a list of integers instead of tokens to the word2vec function
- [ ] see how to remove the embedding of ` `
- [ ] abandon file-based approach
- [ ] speed up for Xptr's like quant…
-
Currently, it looks like unicode characters that cannot be transformed (e.g. by lowercasing or removing accents) are removed when tokenising a string.
It would be handy if there was an analyzer opt…
-
I've spent a few days trying to work out how to use the CyberSource dotnet SDK to submit a tokenised payment authorisation. I'm able to achieve this with the a Flex Token from the Flex Microform javas…
-
```
Simple keyword search: just a conjunction of terms tokenised from literals.
* Could be done using CQL collections: http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_usin…
-
If we tokenise frames of a video with a VQGAN, we can autoregressively predict the next token using our current language model. More specifically, using our current context of 2 million tokens, we cou…
-
Expanding out the whole thing from the (sometimes) compressed mass of `P."moo";:?A=1:MO.2` into separate lines. Would need to handle line numbering cleverly.
-
Create a standalone usable version of the `` component currently living in the internet-header package (https://github.com/swisspost/design-system/tree/main/packages/internet-header/src/components/pos…
-
### Description
I'd like to have multiple `.env` files with support to load them in order of precedence. For example, assume a project defines
* `.env`
* `.env.local`
If local exists, that file…