Closed bakhtos closed 1 month ago
This still failed:
/LogLead/demo/parser_benchmark$ python ano_detection.py
After the following:
pip install python-dotenv
pip install pyyaml
pip install jinja2
it worked. Can you add these to the pip package.
Parsing speed run works only up to Polars = 0.20.21
LogLead/demo/parser_benchmark$ python parsing_speed.py
Starting from 0.20.22 up until the most recent 0.20.31 there is a crash while loading Nezha-shop. Based on the crash report this is most likely bug in Polars. So for now lets force the Polars version to 0.20.21. Once we get this pip packing done we can open bug report in Polars.
Starting 0.20.22 it look lile this:
Processing dataset: Nezha-Shop
Loader: <class 'loglead.loaders.nezha.NezhaLoader'>, args:{'filename': '/home/mmantyla/Datasets/nezha/', 'system': 'WebShop'}
thread '<unnamed>' panicked at crates/polars-core/src/frame/mod.rs:957:36:
should not fail: SchemaMismatch(ErrString("type String is incompatible with expected type Int64"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
Edit: Also Profilence data did not load with 0.20.22 but it gave a different error:
polars.exceptions.NoDataError: 'csv scan' failed
The reason: empty CSV:
And in order not to forget there was the Drain configuration file issue. Perhaps you fixed it already
WARNING:drain3.template_miner_config:config file not found: /home/mmantyla/anaconda3/envs/LL_test_pip2/lib/python3.12/site-packages/loglead/parsers/drain3/drain3.ini
WARNING:drain3.template_miner_config:config file not found: /home/mmantyla/anaconda3/envs/LL_test_pip2/lib/python3.12/site-packages/loglead/parsers/drain3/drain3_no_masking.ini
That is all from my side.
Version numbers in pyproject.toml should be fixed to the minimum unless there is a reason to have a specific version. Like these should be fixed
'regex==2023.10.3',
'drain3==0.9.11',
'tipping==0.1.3',
"scikit-learn==1.2.2"
In main branch there is now environment_no_df.yml that does this as well.
@mmantyla @jnyyssol
gcc
and g++
: we can make some kind of 'Known issues' section in the documentation suggesting to install these, but as I was saying on the plenary - most Linux distributions have them by default, and I am surprised Windows/WSL does not.pip install LogLead[test]
or pip install LogLead[demo]
, since nothing in the loglead itself depends on python-dotenv
or pyyaml
(and users might choose other ways to pass paths). But if you want, we can put everything except tensorflow
stuff in the general dependencies.
python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead[bert]
in the same environment to install also tensorflow
and transformers
to see if the versions are compatible and run some tests that user BertEmbeddings
.jinja2
? It is not imported in LogLead/demo/parser_benchmark/ano_detection.py
.@mmantyla
About drain3
config issue:
Update the version of LogLead from test.pypi.org to version 0.0.5
:
python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead==0.0.5
And check again the drain stuff, config files should now be part of the package
It is easiest if we have one normal version and then deep-learning as separate.
jinja2 comes pandas df.to_latex() call as documented here. Yet, there is no depdency from pandas to jinja2 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_latex.html
- As a sub-point, please run
python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead[bert]
in the same environment to install alsotensorflow
andtransformers
to see if the versions are compatible and run some tests that userBertEmbeddings
.
With this I got the following error, when trying to use the embeddings:
WARNING:root:Could not import BertEmbeddings because of: Failed to import transformers.models.albert.modeling_tf_albert because of the following error (look up to see its traceback): Your currently installed version of Keras is Keras 3, but this is not yet supported in Transformers. Please install the backwards-compatible tf-keras package with 'pip install tf-keras'.
After doing that, I got ImportError: AlbertTokenizer requires the SentencePiece library but it was not found in your environment.
Having installed tf-keras
and sentencepiece
, it runs but gets stuck in the code (or just takes really long). I tried the old environment.yml file and it runs straight out of the box but gets stuck as well. The environment.yml has noticeably older versions than the what the package installs.
TF-keras is certainly not the fastest. I do not remember the details anymore but it eventually completed this run back in November https://github.com/EvoTestOps/LogLead/blob/main/demo/saner_2024_paper/t4_ano_detect_comparison.py The file has commented out option to reduce the dataframe. Perhaps try those.
If that does not help then @yuqwang and @jnyyssol can you have joint session where try to figure this out. @yuqwang perhaps you have learned new tricks how to make DL faster.
@mmantyla @jnyyssol
Test pypi now has LogLead 0.0.6 which pins also tf dependencies and should solve the drain3 issue
python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead==0.0.6
I updated environment.yml to not have fixed build numbers. On my computer the following works
conda env create -f environment.yml
conda activate ll_with_dl
python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ loglead==0.0.7
However, there is still some fixed setting
"scikit-learn==1.2.2",
That cause unnecessary uninstalls when pip installing LogLead on top of the conda ll_with_dl environment.
Installing collected packages: tipping, python-dotenv, jinja2, scikit-learn, loglead
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 1.4.2
Uninstalling scikit-learn-1.4.2:
Successfully uninstalled scikit-learn-1.4.2
Successfully installed jinja2-3.1.4 loglead-0.0.7 python-dotenv-1.0.1 scikit-learn-1.2.2 tipping-0.1.3
@mmantyla The dependency for scikit-learn
is now specified as scikit-learn==1.2.2
per your request above (this is also the version in old environment), but in the new environment file you have scikit-learn>=1.3.0
, so it first installed a more recent version. Does loglead work with recent versions of scikit-learn
?
Word not was missing in above. It should be
'regex>=2023.10.3',
'drain3>=0.9.11',
'tipping>=0.1.3',
"scikit-learn>=1.2.2"
Only Polars has the bug and we need a fixed version. In my local version conda has installed
scikit-learn 1.4.2
and it works fine
I got it working. In addition to
sudo apt install gcc
that we did in Oulu I had to run the following:sudo apt-get install g++
after that building the wheel went smoothly. I think it is building because no compatible binaries are found.What remains is to ensure that other people with Ubuntu on WSL2 could also run it smoothly. Perhaps a setup.sh where those commands are listed?