KnowledgeDiscovery / rca_baselines

Code for "LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis" paper
https://lemma-rca.github.io/
Other
9 stars 1 forks source link

Requesting README #1

Closed nlokeshiisc closed 3 months ago

nlokeshiisc commented 3 months ago

Hi, Thanks very much for releasing this useful dataset. I am facing issues on reproducing the numbers in the paper https://arxiv.org/pdf/2406.05375.

It would be really helpful if the authors could release a README with instructions on the following:

  1. which version of HuggingFace dataset to download?
  2. The semantics/interpretation of preprocessed datasets
  3. Folder structure in which to host the downloaded dataset.
  4. For (say) Metric-only baselines in Table 4, how to run the REASON method
  5. Instructions on how the results are saved also would be very helpful.

Waiting eagerly for a response :-)

Thanks again!

KnowledgeDiscovery commented 3 months ago

Hi Lokesh,

Thank you for your interest in our dataset. We have updated the README with FastPC as an example to help guide you through the data preprocessing and evaluation process. I hope it addresses all your questions!

nlokeshiisc commented 3 months ago

Thank you for the prompt response, Chen :-)

I have a few additional questions regarding the dataset structure. I apologize for any inconvenience caused by the back and forth, but I believe this will be beneficial for the wider community.

I have downloaded the preprocessed dataset from huggingface. However, I encountered an error when using the load_dataset API. Downloading through the git command works, so it might be helpful to update the README to reflect this.

Suppose I want to run the Fast-PC algorithm solely on the metric data, could you please clarify the following:

  1. The unzipped data for issue 1203 contains the following numpy objects:

    1203_case
  2. Do we still need to follow steps 2-5 to preprocess the datasets using the provided code?

  3. I attempted to run Baseline/offline/FastPC/test_FastPC_pod_metric.py but encountered at import error: from gnn_dag import GNNCI. I couldn't find gnn_dag in the repository.

  4. Since the README pertains to issue 1203, could you please update the test_FastPC_metric_pod.py file with the appropriate POD_METRIC_FILE, label, and path_dirs (as per the folder structure shown in the image in Point 1)?

  5. I see that with every issue, you have included a PPT file explaining the scenario. But I am finding a hard time trying to parse the ppt. Is there a documentation on the dataset, verbatim explanation for the issue, etc. somewhere. Apologies if I have missed that, and I kindly request you to point me to it.

Thank you very much for your patience and assistance!

KnowledgeDiscovery commented 2 months ago

No, you don't need to follow steps 2-5 if you use the preprocessed data. We've updated the code to address your concerns, so please try again. For more details on the fault scenarios, please refer to our paper. Thank you!