CannyLab / vdtk

Visual Description Dataset Analysis Toolkit
MIT License
10 stars 2 forks source link

zombie processes of vdtk #3

Closed bardout closed 1 year ago

bardout commented 1 year ago

Thanks for this useful toolkit. I noticed a reproductible issue with concept-leave-one-out, killed after 6 minutes (attempted 3 times). The first, I was unsure if the server was too loaded and/or dataset a bit large. I retried this command with no load at all. It appears concept-leave-one-out does not work properly on 2 datasets, on which other commands worked, and there is no message or easy check.

env

I installed the package in a conda env, Python 3.9.13 on a 20.04.1-Ubuntu SMP 64 cores. The 'pip install vdtk' didnot trigger dependencies. So I manually installed those per pyproject.toml: PRESENT in conda list: "pandas >= 1.5.1", "pytest", "nltk >= 3.6.5", "numpy >= 1.21.4", "matplotlib >= 3.5.0", "fuzzywuzzy >= 0.18.0", "fuzzysearch >= 0.7.3", "jdk4py >= 17.0.3.0", "rich >= 10.14.0", "mpire >= 2.3.1", "click >= 8.0.3", "embeddings >= 0.0.8", "mauve-text >= 0.3.0", "regex >= 2022.10.31", "rouge-score >= 0.1.2", "tqdm >= 4.62.3", in pip list: "POT >= 0.8", "ftfy >= 6.1.1", "sentence-transformers >= 2.1.0", "bert-score >= 0.3.12", "spacy >= 3.2.0",

python3 -m spacy download en
pip install levenshtein # remove _Warning: Using slow pure-python SequenceMatcher_ in fuzzywuzzy.
conda install mypy  isort
conda install  flake8-black # =>  "black",  "flake8",
pip install  tensorflow 2.12.0 #  "tf-slim >= 1.1.0"
conda install  sentencepiece   #>= 0.1.97
conda install  scipy  #>= 1.9.3

run

I used several commands vocab-stats, caption-stats, semantic-variance, concept-overlap to get metrics on small or medium Remote Sensing datasets with captions, that provided plausible results,

I tried concept-leave-one-out on datasetRSITMD OK (23 715 captions), then on RSICD (24333 captions), and had the main (interactive) process killed unexpectedly during ,Evaluating..._ (which is very long) and I noticed many vdtk processes, spawned by this command :

$ vdtk concept-leave-one-out $METRICS/dataset_RSITMD_vdtk.json

Concept Set Leave-One-Out (Exact) ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Concept Set ┃ % Matches ┃ BLEU@1 ┃ BLEU@2 ┃ BLEU@3 ┃ BLEU@4 ┃ ROUGE-L ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ Places365 │ 85.49% │ 0.82 +/- 0.13 │ 0.67 +/- 0.16 │ 0.57 +/- 0.19 │ 0.47 +/- 0.23 │ 0.69 +/- 0.14 │ │ MS-COCO │ 31.29% │ 0.75 +/- 0.14 │ 0.58 +/- 0.17 │ 0.47 +/- 0.20 │ 0.37 +/- 0.24 │ 0.60 +/- 0.15 │ │ ImageNet-1K │ 69.37% │ 0.77 +/- 0.14 │ 0.62 +/- 0.17 │ 0.51 +/- 0.20 │ 0.41 +/- 0.25 │ 0.65 +/- 0.15 │ │ Kinetics-400 │ 0.44% │ 0.63 +/- 0.16 │ 0.39 +/- 0.14 │ 0.24 +/- 0.15 │ 0.15 +/- 0.13 │ 0.44 +/- 0.10 │ │ Kinetics-600 │ 0.72% │ 0.59 +/- 0.16 │ 0.36 +/- 0.19 │ 0.24 +/- 0.19 │ 0.14 +/- 0.18 │ 0.43 +/- 0.13 │ └──────────────┴───────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┘

$ vdtk  concept-leave-one-out   $METRICS/dataset_rsicd_v2_vdtk.json
⠴ Evaluating... ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3% 0:00:34 0:06:43Killed

$ top
3846485 yves.ba+  20   0   24,5g   1,4g   1072 S   0,3   1,1  10:05.57 vdtk
3846489 yves.ba+  20   0   23,9g 813164   1952 S   0,3   0,6   6:53.97 vdtk
3846491 yves.ba+  20   0   24,2g   1,1g   1248 S   0,3   0,9   8:42.48 vdtk
3846492 yves.ba+  20   0   24,4g   1,4g   1984 S   0,3   1,1   8:55.44 vdtk
3846493 yves.ba+  20   0   24,3g   1,2g    356 S   0,3   1,0   9:51.84 vdtk
3846495 yves.ba+  20   0   24,2g   1,1g   1448 S   0,3   0,9   9:13.14 vdtk

I was surprised they didnot disapear. when I logged out of session in which I run the vdtk, and killed the jupyter-notebook in which I had also called vdtk function vocab_stats.

partial solution

The sub-processes do not react to SIG_TERM, but are killed with SIGKILL (9)

$ ps -x
3846483 ?        Sl     8:33 /opt/home/yves.bardout/anaconda3/envs/gis/bin/python3.9 /opt/home/yves.bardout/anaconda3/envs/gis/bin/vdtk concept-leave-one-out /opt/home/yves.bardout/espace_de_travail/wp2/metrics/tls_ac.66_captions_ITR_vdtk.json
...
$ ps -x|grep vdtk|wc -l
129

$ killall vdtk        # SIG_TERM has no effect 
$ killall -s 9 vdtk
$ ps -x|grep vdtk|wc -l
1
$ top
%Cpu(s):  0,1 us,  0,1 sy,  0,0 ni, 99,8 id, 
$ vdtk  concept-leave-one-out   $METRICS/tls_ac.66_captions_ITR_vdtk.json
⠸ Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:03:43 -:--:--

⠋ Evaluating... ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   2% 0:06:36 0:00:40Killed
DavidMChan commented 1 year ago

This is probably a bug in mpire, our upstream which manages multithreading... https://github.com/sybrenjansen/mpire The reason that this happens is that we have no good way of informing subprocesses that the root process is no longer alive, so they're stuck open listening for queues. The best way is to kill them independently when necessary, or to gracefully kill the main process with a keyboard interrupt.

This is a very expensive command, esp. for big datasets, since it requires computing all of the pairwise scores between captions in the dataset.