benchflow / data-transformers

Spark scripts utilised to transform data to the BenchFlow internal formats
Other
0 stars 0 forks source link

Documentation for cutting initial N processes #75

Open Cerfoglg opened 8 years ago

Cerfoglg commented 8 years ago

Since we want to cut a number of the initial processes from a run of a WfMS benchmark, we allow for cutting the first N of each process before storing to cassandra.

The way this is done is by ordering processes by start time in ascending order, finding the time of the Nth process, and marking all processes under or with that time to be ignored, currently a boolean value in our cassandra schema called "to_ignore".

The same thing is done to the constructs, by marking all constructs belonging to the cut processes in the same way.