CMU-SAFARI / Sibyl

Source code for the software implementation of Sibyl proposed in our ISCA 2022 paper: Gagandeep Singh et. al., "Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems using Online Reinforcement Learning" at https://people.inf.ethz.ch/omutlu/pub/Sibyl_RL-based-data-placement-in-hybrid-storage-systems_isca22.pdf
MIT License
31 stars 6 forks source link

Error when reading new VBA #6

Open zjsea opened 1 year ago

zjsea commented 1 year ago

I am trying to run Sibyl with msr-cambridge1-sample.csv (http://iotta.snia.org/traces/block-io/388). I replaced the storage drivers with dummy code that calls time.sleep() to simulate latency.

There is an error with the read() function in hybridstorage.py when a new VBA is being read (line 110 of msr-cambridge1-sample.csv). The read() function only handles the case when VBA is in self._mapping_table.index. This means the VBA must have been written by Sibyl before.

https://github.com/CMU-SAFARI/Sibyl/blob/ab7199f0b7f75710ca6b56870e0f5bd4a33f17eb/sibyl/hybridstorage.py#L274

If the VBA is not in self._mapping_table.index, the latency defaults to 0. This causes a zero division error when computing the reward because self._current_perf is 0.

https://github.com/CMU-SAFARI/Sibyl/blob/ab7199f0b7f75710ca6b56870e0f5bd4a33f17eb/sibyl/hybridstorageenvironment.py#L184

Please advise how should read requests for new VBA be handled. The latency should not default to 0. Thank you.

saarthdeshpande commented 1 month ago

I faced the same issue. The following helped: the MSR traces contain entries where there is either:

  1. a "Read" operation before a "Write" operation, or
  2. a "Read" operation but the trace has no "Write" operation.

I think the authors might have changed the traces to remove such entries. The code works without any issues after removing the above two types of entries from the traces.

Here is the code to remove them, in case you needed it:

import glob
import pandas as pd
from tqdm import tqdm

for file in tqdm(glob.glob("MSR-Cambridge/*.csv")):
    df = pd.read_csv(file)
    df.columns=range(7)

    grouped_vals = df.groupby(4)[3].unique()
    filtered_vals = [index for index, values in grouped_vals.items() if 'Write' not in values or values[0] == 'Read']

    df = df[~df[4].isin(filtered_vals)]
    df = df.iloc[:,3:6][[4,5,3]]
    df.to_csv(f"fixed_msr/{file.split('/')[1]}", header=False, index=False)