Context Preserving Data

evangelos-bitsikas commented 3 months ago

The script tokenizer_and_sim_matrix.py is used to reproduce Figures 3 and 4 from the paper. This script requires the following files:

cp_corpus_4G.txt
cp_corpus_5G.txt

However, the methodology or code to produce these files is not provided. Without this crucial information, it is unclear how to verify the results shown in Figures 3 and 4.

Masfiqur-Mim commented 3 months ago

Hello @evangelos-bitsikas,

Thank you for raising the issue. We have added a Preprocessor.ipynb file, which processes the raw data (also added recently) and produces the cp_corpus_4G.txt and cp_corpus_5G.txt. Please pull for the updates. You can run the cells sequentially to get the files.

Note that one whole run of the Preprocessor.ipynb notebook produces cp_corpus for only one network type (either 4G or 5G). This is set in the second cell of the notebook using the NET_TYPE = '4G' or NET_TYPE = '5G'. Thus you need to run the notebook twice overall.

So in short, please change the second cell accordingly before the second run to correctly get both the cp corpora. If you have any questions, please let us know.

evangelos-bitsikas commented 2 months ago

@Masfiqur-Mim I appreciate your response.

CellularLint / cellularlint-codes

Context Preserving Data #2