Closed yushengsu-thu closed 2 months ago
Oh I think it's because numpy 2.x is incompatible with numpy 1.x APIs. Cutting a quick fix and a new release (dolma 1.0.12) momentarily to fix that.
@soldni thanks for your reply.
I found this issue comes from Step 0: Obtain Wikipedia
processed data because of its used package wikiextractor
Now I have found a temporary solution: set the python (from 3.12 --> 3.11) and pkgs in the following version:
Python 3.11.9
numpy 1.26.3
wikiextractor 3.0.6
dolma 1.0.11
Then, re-run the Step 0: Obtain Wikipedia
python scripts/make_wikipedia.py \
--output wikipedia \
--date 20231001 \
--lang simple \
--processes 16
and use its processed data to conduct the Step 1: Run Taggers
that can mitigate this issus.
Hello @soldni , I have one more question. When I execute
Step 1: Run Taggers
,I encounter the following issue:
My env:
Is this issue coming from the processed (I used scripts/make_wikipedia.py) data
wikipedia/v0/documents/wiki_00.gz
or the codebases in dolma? Do you have any suggestion to mitigate or solve this issue?