Solution: Manually corrected the wrong offsets in PMID 21904390 as the wrong offsets do not seem to follow any pattern.
Checkbox
[x] Confirm that this PR is linked to the dataset issue.
[x] Confirm dataloader script works with datasets.load_dataset function.
[x] Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio_hub <dataset_name> [--data_dir /path/to/local/data] --test_local.
This is OPTIONAL for public datasets, as we can test these without access to the data files.
Closes #873 - Wrong entity offsets in the tmvar_v3 datasets
Wrong offsets in PMID 21904390 are already present in the source file https://ftp.ncbi.nlm.nih.gov/pub/lu/tmVar3/tmVar3Corpus.txt
Solution: Manually corrected the wrong offsets in PMID 21904390 as the wrong offsets do not seem to follow any pattern.
Checkbox
datasets.load_dataset
function.python -m tests.test_bigbio_hub <dataset_name> [--data_dir /path/to/local/data] --test_local
. This is OPTIONAL for public datasets, as we can test these without access to the data files.