EvoTestOps / LogLead

LogLead stands for Log Loader, Enhancer, and Anomaly Detector.
MIT License
15 stars 2 forks source link

Test PyPI/pip packaging #29

Closed bakhtos closed 1 month ago

mmantyla commented 3 months ago

I got it working. In addition to sudo apt install gcc that we did in Oulu I had to run the following: sudo apt-get install g++ after that building the wheel went smoothly. I think it is building because no compatible binaries are found.

What remains is to ensure that other people with Ubuntu on WSL2 could also run it smoothly. Perhaps a setup.sh where those commands are listed?

mmantyla commented 3 months ago

This still failed: /LogLead/demo/parser_benchmark$ python ano_detection.py

After the following: pip install python-dotenv pip install pyyaml pip install jinja2

it worked. Can you add these to the pip package.

mmantyla commented 3 months ago

Parsing speed run works only up to Polars = 0.20.21 LogLead/demo/parser_benchmark$ python parsing_speed.py

Starting from 0.20.22 up until the most recent 0.20.31 there is a crash while loading Nezha-shop. Based on the crash report this is most likely bug in Polars. So for now lets force the Polars version to 0.20.21. Once we get this pip packing done we can open bug report in Polars.

Starting 0.20.22 it look lile this:

Processing dataset: Nezha-Shop
Loader: <class 'loglead.loaders.nezha.NezhaLoader'>, args:{'filename': '/home/mmantyla/Datasets/nezha/', 'system': 'WebShop'}
thread '<unnamed>' panicked at crates/polars-core/src/frame/mod.rs:957:36:
should not fail: SchemaMismatch(ErrString("type String is incompatible with expected type Int64"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):

Edit: Also Profilence data did not load with 0.20.22 but it gave a different error:

polars.exceptions.NoDataError: 'csv scan' failed
The reason: empty CSV:
mmantyla commented 3 months ago

And in order not to forget there was the Drain configuration file issue. Perhaps you fixed it already

WARNING:drain3.template_miner_config:config file not found: /home/mmantyla/anaconda3/envs/LL_test_pip2/lib/python3.12/site-packages/loglead/parsers/drain3/drain3.ini
WARNING:drain3.template_miner_config:config file not found: /home/mmantyla/anaconda3/envs/LL_test_pip2/lib/python3.12/site-packages/loglead/parsers/drain3/drain3_no_masking.ini

That is all from my side.

mmantyla commented 3 months ago

Version numbers in pyproject.toml should be fixed to the minimum unless there is a reason to have a specific version. Like these should be fixed

'regex==2023.10.3',
'drain3==0.9.11',
'tipping==0.1.3',
"scikit-learn==1.2.2"

In main branch there is now environment_no_df.yml that does this as well.

bakhtos commented 3 months ago

@mmantyla @jnyyssol

bakhtos commented 3 months ago

@mmantyla About drain3 config issue:

Update the version of LogLead from test.pypi.org to version 0.0.5:

 python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead==0.0.5

And check again the drain stuff, config files should now be part of the package

mmantyla commented 3 months ago

It is easiest if we have one normal version and then deep-learning as separate.

jinja2 comes pandas df.to_latex() call as documented here. Yet, there is no depdency from pandas to jinja2 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_latex.html

jnyyssol commented 3 months ago
  • As a sub-point, please run python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead[bert] in the same environment to install also tensorflow and transformers to see if the versions are compatible and run some tests that user BertEmbeddings.

With this I got the following error, when trying to use the embeddings: WARNING:root:Could not import BertEmbeddings because of: Failed to import transformers.models.albert.modeling_tf_albert because of the following error (look up to see its traceback): Your currently installed version of Keras is Keras 3, but this is not yet supported in Transformers. Please install the backwards-compatible tf-keras package with 'pip install tf-keras'.

After doing that, I got ImportError: AlbertTokenizer requires the SentencePiece library but it was not found in your environment.

Having installed tf-keras and sentencepiece, it runs but gets stuck in the code (or just takes really long). I tried the old environment.yml file and it runs straight out of the box but gets stuck as well. The environment.yml has noticeably older versions than the what the package installs.

mmantyla commented 3 months ago

TF-keras is certainly not the fastest. I do not remember the details anymore but it eventually completed this run back in November https://github.com/EvoTestOps/LogLead/blob/main/demo/saner_2024_paper/t4_ano_detect_comparison.py The file has commented out option to reduce the dataframe. Perhaps try those.

If that does not help then @yuqwang and @jnyyssol can you have joint session where try to figure this out. @yuqwang perhaps you have learned new tricks how to make DL faster.

bakhtos commented 3 months ago

@mmantyla @jnyyssol

Test pypi now has LogLead 0.0.6 which pins also tf dependencies and should solve the drain3 issue

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead==0.0.6
mmantyla commented 3 months ago

I updated environment.yml to not have fixed build numbers. On my computer the following works

conda env create -f environment.yml
conda activate ll_with_dl
python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead==0.0.7

However, there is still some fixed setting "scikit-learn==1.2.2",

That cause unnecessary uninstalls when pip installing LogLead on top of the conda ll_with_dl environment.

Installing collected packages: tipping, python-dotenv, jinja2, scikit-learn, loglead
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.4.2
    Uninstalling scikit-learn-1.4.2:
      Successfully uninstalled scikit-learn-1.4.2
Successfully installed jinja2-3.1.4 loglead-0.0.7 python-dotenv-1.0.1 scikit-learn-1.2.2 tipping-0.1.3
bakhtos commented 3 months ago

@mmantyla The dependency for scikit-learn is now specified as scikit-learn==1.2.2 per your request above (this is also the version in old environment), but in the new environment file you have scikit-learn>=1.3.0, so it first installed a more recent version. Does loglead work with recent versions of scikit-learn?

mmantyla commented 3 months ago

Word not was missing in above. It should be

'regex>=2023.10.3',
'drain3>=0.9.11',
'tipping>=0.1.3',
"scikit-learn>=1.2.2"

Only Polars has the bug and we need a fixed version. In my local version conda has installed scikit-learn 1.4.2 and it works fine