aertslab / pycisTopic

pycisTopic is a Python module to simultaneously identify cell states and cis-regulatory topics from single cell epigenomics data.
Other
56 stars 11 forks source link

RunTime error in Mallet returned non-zero exit status 134 #152

Closed ashleighthomas closed 2 months ago

ashleighthomas commented 2 months ago

Describe the bug I am following the tutorial (including the tutorial data) detailed here: https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html I am at the step of running the models:

from pycisTopic.lda_models import run_cgs_models_mallet
# Configure path Mallet
mallet_path="Mallet-202108/bin/mallet"
# Run models
models=run_cgs_models_mallet(
    cistopic_obj,
    n_topics=[2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
    n_cpu=12,
    n_iter=500,
    random_state=555,
    alpha=50,
    alpha_by_topic=True,
    eta=0.1,
    eta_by_topic=False,
    tmp_path="/scratch/leuven/330/vsc33053/ray_spill/mallet/tutorial",
    save_path="/scratch/leuven/330/vsc33053/ray_spill/mallet/tutorial",
    mallet_path=mallet_path,
)

This line of code produces several errors, one of which is potentially more important: returning non-zero exit status 134. I see that likely means SIGSEGV, and that maybe this means I am running out of memory. Does this make sense from the pycisTopic perspective? I also was wondering: for the last runtime error: does this indicate that java is not installed? I have seen that this is necessary, and when I looked at the packages installed in my conda env, java wasn't there. So I then ran conda install bioconda::java-jdk which didn't solve the problem (and outputs error 2, below in error output)

To Reproduce Run the make model steps of the pycisTopic tutorial.

Error output Error 1 (before installing java)

2024-07-24 11:01:38,520 cisTopic     INFO     Formatting input to corpus
2024-07-24 11:01:38,744 cisTopic     INFO     Running model with 2 topics
2024-07-24 11:01:38,744 LDAMalletWrapper INFO     Serializing temporary corpus to /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.txt
2024-07-24 11:01:48,432 LDAMalletWrapper INFO     Converting temporary corpus to MALLET format with: Mallet-202108/bin/mallet import-file --preserve-case --keep-sequence --token-regex \S+ --input /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.txt --output /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet
2024-07-24 11:02:01,084 LDAMalletWrapper INFO     Training MALLET LDA with: Mallet-202108/bin/mallet train-topics --input /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet --num-topics 2 --alpha 50 --beta 0.1 --optimize-interval 0 --num-threads 12 --output-state /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/554f85_state.mallet.gz --output-doc-topics /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/554f85_doctopics.txt --output-topic-keys /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/554f85_topickeys.txt --num-iterations 500 --inferencer-filename /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/554f85_inferencer.mallet --doc-topics-threshold 0.0 --random-seed 555
2024-07-24 11:15:53,469 LDAMalletWrapper INFO     loading assigned topics from /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/554f85_state.mallet.gz
2024-07-24 11:16:01,925 cisTopic     INFO     Model with 2 topics done!
2024-07-24 11:16:01,925 cisTopic     INFO     Saving model with 2 topics at /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial
2024-07-24 11:16:01,971 cisTopic     INFO     Running model with 5 topics
2024-07-24 11:16:01,971 LDAMalletWrapper INFO     Serializing temporary corpus to /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.txt
2024-07-24 11:16:11,868 LDAMalletWrapper INFO     Converting temporary corpus to MALLET format with: Mallet-202108/bin/mallet import-file --preserve-case --keep-sequence --token-regex \S+ --input /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.txt --output /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet
2024-07-24 11:16:25,064 LDAMalletWrapper INFO     Training MALLET LDA with: Mallet-202108/bin/mallet train-topics --input /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet --num-topics 5 --alpha 50 --beta 0.1 --optimize-interval 0 --num-threads 12 --output-state /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_state.mallet.gz --output-doc-topics /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_doctopics.txt --output-topic-keys /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_topickeys.txt --num-iterations 500 --inferencer-filename /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_inferencer.mallet --doc-topics-threshold 0.0 --random-seed 555
Traceback (most recent call last):
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 593, in train
    subprocess.check_output(args=cmd, shell=False, stderr=subprocess.STDOUT)
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['Mallet-202108/bin/mallet', 'train-topics', '--input', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet', '--num-topics', '5', '--alpha', '50', '--beta', '0.1', '--optimize-interval', '0', '--num-threads', '12', '--output-state', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_state.mallet.gz', '--output-doc-topics', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_doctopics.txt', '--output-topic-keys', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_topickeys.txt', '--num-iterations', '500', '--inferencer-filename', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_inferencer.mallet', '--doc-topics-threshold', '0.0', '--random-seed', '555']' returned non-zero exit status 134.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/qc.py", line 128, in <module>
    models=run_cgs_models_mallet(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 809, in run_cgs_models_mallet
    model_list = [
                 ^
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 810, in <listcomp>
    run_cgs_model_mallet(
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 919, in run_cgs_model_mallet
    model = LDAMallet(
            ^^^^^^^^^^
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 470, in __init__
    self.train(corpus, reuse_corpus)
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 595, in train
    raise RuntimeError(
RuntimeError: command '['Mallet-202108/bin/mallet', 'train-topics', '--input', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet', '--num-topics', '5', '--alpha', '50', '--beta', '0.1', '--optimize-interval', '0', '--num-threads', '12', '--output-state', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_state.mallet.gz', '--output-doc-topics', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_doctopics.txt', '--output-topic-keys', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_topickeys.txt', '--num-iterations', '500', '--inferencer-filename', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/f207eb_inferencer.mallet', '--doc-topics-threshold', '0.0', '--random-seed', '555']' return with error (code 134): b'Mallet LDA: 5 topics, 3 topic bits, 111 topic mask\nData loaded.\nmax tokens: 182472\ntotal tokens: 46095355\n<10> LL/token: -13.89521\n<20> LL/token: -13.35815\n<30> LL/token: -13.1008\n<40> LL/token: -13.00914\n\n0\t10\t210796 354096 275324 298293 251637 102262 140413 117704 338996 190369 142444 352766 68567 252482 426321 254444 251730 119465 374897 31851 \n1\t10\t208465 117636 409877 187823 67442 418515 209611 273400 311414 139860 227433 94178 27798 30079 273059 100524 228515 189033 96892 323949 \n2\t10\t190388 64577 189754 98494 375199 411704 69285 141835 248808 326707 412805 372661 297695 166030 97195 140557 64751 337120 118412 368793 \n3\t10\t229071 229109 397461 251026 385128 165742 352051 395805 399936 397454 64341 397751 25844 425534 397090 370133 295070 164955 351490 26358 \n4\t10\t65197 64175 164693 119073 395222 94457 272555 352231 387203 66312 385159 26135 428138 250578 210068 254352 385081 299899 209204 410018 \n\n<50> LL/token: -12.97045\n<60> LL/token: -12.9461\n<70> LL/token: -12.9284\n<80> LL/token: -12.91713\n<90> LL/token: -12.90954\n\n0\t10\t69629 98287 140413 32902 189425 31694 31625 33598 116989 275324 120025 191156 427620 277116 253227 396262 275082 99772 211938 98531 \n1\t10\t278221 208465 209611 63644 95715 409877 164754 27798 228515 323491 250257 117636 409783 163602 67405 351448 165703 187823 376545 396857 \n2\t10\t30079 396169 27269 209857 326707 411704 275299 411877 163571 354320 369297 69229 339393 211460 163556 418699 143174 425696 119721 142954 \n3\t10\t229071 229109 399936 397461 397751 251026 26358 165742 395805 418515 30587 164955 117551 385128 295070 397090 370133 397454 384425 25844 \n4\t10\t65197 117532 164693 165352 395222 385128 395805 428138 249087 64577 28038 209204 396957 31189 96740 227602 299899 208976 67798 384677 \n\n<100> LL/token: -12.90623\n<110> LL/token: -12.90399\n<120> LL/token: -12.90275\n<130> LL/token: -12.90138\n<140> LL/token: -12.90049\n\n0\t10\t116989 69629 140413 99481 98287 64646 31755 31694 32902 298651 339985 211624 31625 33598 210882 210770 120025 253227 324808 368803 \n1\t10\t208465 95715 27798 209611 409877 323491 164754 228515 386593 409783 396857 273861 63644 296422 250589 411330 323358 352251 323100 384418 \n2\t10\t30079 27269 251084 369297 69229 143174 354320 396169 31910 140524 211460 166748 397802 370041 252487 163571 211016 351635 166533 68424 \n3\t10\t229109 229071 295070 399936 397461 26358 385128 165742 397751 64341 425534 351490 251026 395805 352051 165867 164955 368662 418515 139282 \n4\t10\t164693 117532 396276 299899 227602 426167 397308 251026 395222 368559 142034 428138 385128 272555 229071 28877 396009 141083 322771 250229 \n\n<150> LL/token: -12.89921\n<160> LL/token: -12.89755\n<170> LL/token: -12.89568\n<180> LL/token: -12.89383\n<190> LL/token: -12.89142\n\n0\t10\t98517 186538 69629 64646 116989 210770 164584 144068 34091 387203 299122 386563 276932 99481 98287 276499 120007 140413 385081 339985 \n1\t10\t208465 409877 95715 409783 273861 353759 228515 27798 411330 98494 323358 322875 323100 65028 384418 145229 164754 352251 278221 192413 \n2\t10\t30079 211460 31910 166748 166533 396169 190135 251084 351635 68994 340789 211024 142839 398147 140524 32380 166513 325106 275437 426692 \n3\t10\t229109 397461 229071 26358 385128 418515 425534 397751 165742 351490 251026 352051 295070 399936 30587 397454 368662 395805 395764 164955 \n4\t10\t31189 97406 395805 428138 396276 164693 274102 251026 395222 271444 165352 395197 299899 369303 322771 25844 352231 272555 272547 188031 \n\n<200> LL/token: -12.88926\n<210> LL/token: -12.88715\n<220> LL/token: -12.8856\n<230> LL/token: -12.88372\n<240> LL/token: -12.88248\n\n0\t10\t98517 387203 385081 34091 186538 69629 227458 232605 410881 116989 250631 164584 27920 386563 210770 276932 67661 208987 25253 368803 \n1\t10\t208465 250589 409877 164754 273861 27798 398492 323358 353759 95715 145229 192413 27974 396857 95308 352251 65028 409783 63644 228515 \n2\t10\t396169 211460 30079 31910 351635 210098 140524 211024 398147 385531 275437 190135 229479 325106 278379 428007 232324 94201 427080 397802 \n3\t10\t397461 229109 229071 26358 351490 418515 165742 397751 295070 425534 164955 251026 385128 397454 395805 96814 368662 187641 395764 399936 \n4\t10\t64577 251026 395805 385128 397493 117532 163713 254352 369303 428138 164693 274102 189087 142034 96740 25259 396276 250229 396009 251503 \n\n<250> LL/token: -12.88139\n<260> LL/token: -12.88044\n<270> LL/token: -12.87974\n<280> LL/token: -12.87917\n#\n# A fatal error has been detected by the Java Runtime Environment:\n#\n#  SIGSEGV (0xb) at pc=0x00007f4abfc394bc, pid=4057057, tid=4057130\n#\n# JRE version: OpenJDK Runtime Environment (11.0.23+9) (build 11.0.23+9-post-Ubuntu-1ubuntu120.04.2)\n# Java VM: OpenJDK 64-Bit Server VM (11.0.23+9-post-Ubuntu-1ubuntu120.04.2, mixed mode, tiered, g1 gc, linux-amd64)\n# Problematic frame:\n# J 1001 c2 cc.mallet.topics.WorkerCallable.sampleTopicsForOneDoc(Lcc/mallet/types/FeatureSequence;Lcc/mallet/types/FeatureSequence;Z)I (1806 bytes) @ 0x00007f4abfc394bc [0x00007f4abfc38240+0x000000000000127c]\n#\n# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/core.4057057)\n#\n# An error report file with more information is saved as:\n# /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/hs_err_pid4057057.log\nCould not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled\n#\n# If you would like to submit a bug report, please visit:\n#   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts\n#\nMallet-202108/bin/mallet: line 60: 4057057 Aborted                 (core dumped) java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$@"\n'

error 2 (after installing java):

CistopicObject from project 10x_multiome_brain with n_cells × n_regions = 2084 × 436231
2024-07-24 14:42:46,186 cisTopic     INFO     Formatting input to corpus
2024-07-24 14:42:46,405 cisTopic     INFO     Running model with 2 topics
2024-07-24 14:42:46,405 LDAMalletWrapper INFO     Serializing temporary corpus to /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.txt
2024-07-24 14:42:56,171 LDAMalletWrapper INFO     Converting temporary corpus to MALLET format with: Mallet-202108/bin/mallet import-file --preserve-case --keep-sequence --token-regex \S+ --input /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.txt --output /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet
Traceback (most recent call last):
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 535, in convert_input
    subprocess.check_output(args=cmd, shell=False, stderr=subprocess.STDOUT)
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['Mallet-202108/bin/mallet', 'import-file', '--preserve-case', '--keep-sequence', '--token-regex', '\\S+', '--input', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.txt', '--output', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/qc.py", line 128, in <module>
    models=run_cgs_models_mallet(#gives error SIGSEGV /home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/hs_err_pid4057057.log
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 809, in run_cgs_models_mallet
    model_list = [
                 ^
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 810, in <listcomp>
    run_cgs_model_mallet(
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 919, in run_cgs_model_mallet
    model = LDAMallet(
            ^^^^^^^^^^
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 470, in __init__
    self.train(corpus, reuse_corpus)
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 555, in train
    self.convert_input(corpus)
  File "/opt/anaconda3/envs/scenicPlusJuly/lib/python3.11/site-packages/pycisTopic/lda_models.py", line 537, in convert_input
    raise RuntimeError(
RuntimeError: command '['Mallet-202108/bin/mallet', 'import-file', '--preserve-case', '--keep-sequence', '--token-regex', '\\S+', '--input', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.txt', '--output', '/home/BCCRC.CA/athomas/Documents/scenicPlusJuly2024/rayDir/mallet/tutorial/corpus.mallet']' return with error (code 1): b'Exception in thread "main" java.lang.UnsupportedClassVersionError: cc/mallet/classify/tui/Csv2Vectors : Unsupported major.minor version 52.0\n\tat java.lang.ClassLoader.defineClass1(Native Method)\n\tat java.lang.ClassLoader.defineClass(ClassLoader.java:800)\n\tat java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)\n\tat java.net.URLClassLoader.defineClass(URLClassLoader.java:449)\n\tat java.net.URLClassLoader.access$100(URLClassLoader.java:71)\n\tat java.net.URLClassLoader$1.run(URLClassLoader.java:361)\n\tat java.net.URLClassLoader$1.run(URLClassLoader.java:355)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat java.net.URLClassLoader.findClass(URLClassLoader.java:354)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:425)\n\tat sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)\n\tat java.lang.ClassLoader.loadClass(ClassLoader.java:358)\n\tat sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)\n'

Expected behavior I expect the model creation step to run

Version (please complete the following information):

Additional context I am trying to figure out if this problem is due to not having enough available memory to run the program, or if it is a problem in configuration (i.e. not having java)

ashleighthomas commented 2 months ago

Ok I found from error 52 that I needed java 8. Within my conda environment I ran conda install openjdk=8 which fixed the issue, and now all of the model creation steps run.