Open mpgreg opened 1 year ago
https://github.com/astronomer/ask-astro/blob/c45487c7f12a9424dbe885580c687e35e30b7de4/airflow/include/data/schema.json#L54
Without specifying a tokenization scheme ingest will default to word as per https://weaviate.io/developers/weaviate/config-refs/schema#property-tokenization. This will split snake-case configuration parameters and environment variables treating underscore as whitespace.
word
Example as per https://github.com/weaviate/weaviate/blob/764935fe4b576c87750d6a16ea20fd6c349b20b8/adapters/repos/db/helpers/tokenizer.go#L67
func main() { in := "THIS is my_env_variable" fmt.Print("\nwhitespace") fmt.Print(tokenizeWhitespace(in)) fmt.Print("\nlowercase") fmt.Print(tokenizeLowercase(in)) fmt.Print("\nword") fmt.Print(tokenizeWord(in)) fmt.Print("\nwildcards") fmt.Print(tokenizeWordWithWildcards(in)) }
Results in...
whitespace[THIS is my_env_variable] lowercase[this is my_env_variable] word[this is my env variable] wildcards[this is my env variable]
To prevent splitting of snake-case words or to lose camel-case params we need to switch to whitespace.
whitespace
@sunank200 — is this issue still relevant?
https://github.com/astronomer/ask-astro/blob/c45487c7f12a9424dbe885580c687e35e30b7de4/airflow/include/data/schema.json#L54
Without specifying a tokenization scheme ingest will default to
word
as per https://weaviate.io/developers/weaviate/config-refs/schema#property-tokenization. This will split snake-case configuration parameters and environment variables treating underscore as whitespace.Example as per https://github.com/weaviate/weaviate/blob/764935fe4b576c87750d6a16ea20fd6c349b20b8/adapters/repos/db/helpers/tokenizer.go#L67
Results in...
To prevent splitting of snake-case words or to lose camel-case params we need to switch to
whitespace
.