Since upgrading dolma to version 1.0.0 I only get the attributes from the last tagger in the list.
I think the problem is here:
https://github.com/allenai/dolma/blob/a74b78ac531e06adb61bf70986c8d2a3ef38e9d7/python/dolma/core/runtime.py#L198-L200tagger_output.path is the same for all the taggers in the list, but attributes_by_stream[tagger_output.path] will be set to empty dictionary when looping through the taggers, leaving only the attributes from the last tagger in the list.
This bug is not present in version 0.9.4.
I would submit a pull request, but I am not sure what these three lines are supposed to fix.
Since upgrading dolma to version 1.0.0 I only get the attributes from the last tagger in the list. I think the problem is here: https://github.com/allenai/dolma/blob/a74b78ac531e06adb61bf70986c8d2a3ef38e9d7/python/dolma/core/runtime.py#L198-L200
tagger_output.path
is the same for all the taggers in the list, butattributes_by_stream[tagger_output.path]
will be set to empty dictionary when looping through the taggers, leaving only the attributes from the last tagger in the list. This bug is not present in version 0.9.4. I would submit a pull request, but I am not sure what these three lines are supposed to fix.