Hi!
As far as I understand (and tried the code), the current implementation assumes that the input dump file contains a single revision per pageID.
The historical dump files contain all revisions of a single page, and when this is given as input for the code, it generates long textual content without splitting it into revisions.
Is there a simple way to "force" the code to take into account the different revisions per pageID?
Hi! As far as I understand (and tried the code), the current implementation assumes that the input dump file contains a single revision per pageID. The historical dump files contain all revisions of a single page, and when this is given as input for the code, it generates long textual content without splitting it into revisions. Is there a simple way to "force" the code to take into account the different revisions per pageID?
thank you!