Closed lintool closed 2 months ago
Attention: Patch coverage is 66.66667%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 67.18%. Comparing base (
98e4866
) to head (6185321
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This works, alternatively we could just set the truncation flag to "DO NOT TRUNCATE"
options.put("truncation", "DO_NOT_TRUNCATE");
I've confirmed that all these affected regressions pass:
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-ar-aca > logs/log.miracl-v1.0-ar-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-bn-aca > logs/log.miracl-v1.0-bn-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-en-aca > logs/log.miracl-v1.0-en-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-es-aca > logs/log.miracl-v1.0-es-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-fa-aca > logs/log.miracl-v1.0-fa-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-fi-aca > logs/log.miracl-v1.0-fi-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-fr-aca > logs/log.miracl-v1.0-fr-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-hi-aca > logs/log.miracl-v1.0-hi-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-id-aca > logs/log.miracl-v1.0-id-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-ja-aca > logs/log.miracl-v1.0-ja-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-ko-aca > logs/log.miracl-v1.0-ko-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-ru-aca > logs/log.miracl-v1.0-ru-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-sw-aca > logs/log.miracl-v1.0-sw-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-te-aca > logs/log.miracl-v1.0-te-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-th-aca > logs/log.miracl-v1.0-th-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression miracl-v1.0-zh-aca > logs/log.miracl-v1.0-zh-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-ar-aca > logs/log.mrtydi-v1.1-ar-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-bn-aca > logs/log.mrtydi-v1.1-bn-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-en-aca > logs/log.mrtydi-v1.1-en-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-fi-aca > logs/log.mrtydi-v1.1-fi-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-id-aca > logs/log.mrtydi-v1.1-id-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-ja-aca > logs/log.mrtydi-v1.1-ja-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-ko-aca > logs/log.mrtydi-v1.1-ko-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-ru-aca > logs/log.mrtydi-v1.1-ru-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-sw-aca > logs/log.mrtydi-v1.1-sw-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-te-aca > logs/log.mrtydi-v1.1-te-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression mrtydi-v1.1-th-aca > logs/log.mrtydi-v1.1-th-aca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression msmarco-v1-passage.wp-ca > logs/log.msmarco-v1-passage.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression msmarco-v1-doc.wp-ca > logs/log.msmarco-v1-doc.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression msmarco-v1-doc-segmented.wp-ca > logs/log.msmarco-v1-doc-segmented.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage.wp-ca > logs/log.dl19-passage.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc.wp-ca > logs/log.dl19-doc.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc-segmented.wp-ca > logs/log.dl19-doc-segmented.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl20-passage.wp-ca > logs/log.dl20-passage.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl20-doc.wp-ca > logs/log.dl20-doc.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl20-doc-segmented.wp-ca > logs/log.dl20-doc-segmented.wp-ca.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression msmarco-v1-passage.wp-hgf > logs/log.msmarco-v1-passage.wp-hgf.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression msmarco-v1-doc.wp-hgf > logs/log.msmarco-v1-doc.wp-hgf.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl19-passage.wp-hgf > logs/log.dl19-passage.wp-hgf.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl19-doc.wp-hgf > logs/log.dl19-doc.wp-hgf.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl20-passage.wp-hgf > logs/log.dl20-passage.wp-hgf.txt 2>&1
python src/main/python/run_regression.py --index --verify --search --regression dl20-doc.wp-hgf > logs/log.dl20-doc.wp-hgf.txt 2>&1
Another take for #2535
After digging through source code: https://github.com/deepjavalibrary/djl/blob/master/extensions/tokenizers/src/main/java/ai/djl/huggingface/tokenizers/HuggingFaceTokenizer.java
I was able to override the tokenize arbitrarily long sequences.