ColdMillenium / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 0 forks source link

[RevisionMachine] IndexGenerator should produce data files instead of SQL dumps #74

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Analog to issue 72, IndexGenerator should also be able to produce data files 
instead of sql dumps.

Original issue reported on code.google.com by oliver.ferschke on 20 Jan 2012 at 5:06

GoogleCodeExporter commented 9 years ago
Solved in r514

IndexGenerator now produces data files if the property "outputDatafile" is set 
to true in the configuration file.

The data can be imported into the following schemata;
CREATE TABLE index_articleID_rc_ts (
ArticleID INTEGER UNSIGNED NOT NULL, 
FullRevisionPKs MEDIUMTEXT NOT NULL, 
RevisionCounter MEDIUMTEXT NOT NULL, 
FirstAppearance BIGINT NOT NULL, 
LastAppearance BIGINT NOT NULL, 
PRIMARY KEY(ArticleID)
) TYPE = MyISAM DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

CREATE TABLE index_revisionID (
RevisionID INTEGER UNSIGNED NOT NULL, 
RevisionPK INTEGER UNSIGNED NOT NULL, 
FullRevisionPK INTEGER UNSIGNED NOT NULL, 
PRIMARY KEY(RevisionID)
) TYPE = MyISAM DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

CREATE TABLE index_chronological (
ArticleID INTEGER UNSIGNED NOT NULL, 
Mapping MEDIUMTEXT NOT NULL, 
ReverseMapping MEDIUMTEXT NOT NULL, 
PRIMARY KEY(ArticleID)
) TYPE = MyISAM DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

Importing of the data files can be achieved with:
load data local infile '/path/to/file/articleIndex.csv' into table 
index_articleID_rc_ts
fields terminated by ','
optionally enclosed by '"'
lines terminated by ';'
(ArticleID,FullRevisionPKs,RevisionCounter,FirstAppearance,LastAppearance);

load data local infile 
'/path/to/file/wiki_data/enwiki-20120104/revisionIndex.csv' into table 
index_revisionID
fields terminated by ','
optionally enclosed by '"'
lines terminated by ';'
(RevisionID,RevisionPK,FullRevisionPK);

load data local infile '/path/to/file/enwiki-20120104/articleIndex.csv' into 
table index_chronological
fields terminated by ','
optionally enclosed by '"'
lines terminated by ';'
(ArticleID,Mapping,ReverseMapping);

Original comment by oliver.ferschke on 24 Jan 2012 at 1:45

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 24 Jan 2012 at 1:46

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 16 Feb 2012 at 1:20