adbailey4 / yeast_rrna_modification_detection

MIT License
2 stars 0 forks source link

Inference pipeline with example data not running #6

Closed wcvchan closed 1 year ago

wcvchan commented 1 year ago

Hi Andrew,

I am trying to runinference_pipeline.py with the minimal example 20210415_R941_mutant451.fastq . I am using an HPC which has apptainer instead of docker. I extracted the minimal example data to my working directory then pulled the container from dockerhub to my home directory using apptainer pull docker://ucscbailey/yeast_rrna:latest I then opened the container and ran inference_pipeline.py interactively

apptainer shell ~/yeast_rrna_latest.sif

inference_pipeline.py --fastq 20210415_R941_mutant451.fastq --fast5 20210415_R941_mutant451/20210415_0552_MN20528_AGG125_7a2113f4/fast5/ --reference /Shared/Reference/genomes/index/yeast_25S_18S.fa --path_to_bin /root/signalAlign/bin --threads 2 --name min_example --output_dir .

it returned the following error message, note that it says [multithread_signal_alignment_samples] min_example generated 0 output_files (related to #4?)

Running SignalAlign
[SignalAlignment.run] NOTICE: Creating forward and backward fasta files.
[multithread_signal_alignment_samples] Running SignalAlign on sample: min_example
[multithread_signal_alignment_samples] min_example generated 0 output_files

#  signalAlign - finished alignments

#  signalAlign - finished alignments

[signalAlign] Complete
Running Time = 34.80782834812999 seconds
Running sa2bed
There are no valid .tsv files: Assertion '!all_tsvs.empty()' failed in file '/root/embed_fast5/src/SignalAlignToBed.cpp' line 166
terminate called after throwing an instance of 'AssertionFailureException'
  what():  There are no valid .tsv files: Assertion '!all_tsvs.empty()' failed in file '/root/embed_fast5/src/SignalAlignToBed.cpp' line 166
Traceback (most recent call last):
  File "/opt/venv/bin/inference_pipeline.py", line 240, in <module>
    ret, time = (time_it(main))
  File "/opt/venv/lib/python3.7/site-packages/py3helpers/utils.py", line 169, in time_it
    something = func(*args)
  File "/opt/venv/bin/inference_pipeline.py", line 235, in main
    check_call(f"embed_main sa2bed -d {outpath}/tempFiles_alignment/{name}/ -a {run_config_dict['ambig_model']} "
  File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['embed_main', 'sa2bed', '-d', './signalalign_output/tempFiles_alignment/min_example/', '-a', './signalalign_output/small_variants.model', '-o', './signalalign_output/variant_calls/min_example.bed', '-t', '2', '-c', 'B', '--overwrite', '--rna']' died with <Signals.SIGABRT: 6>.

When I ran it with --debug it returned with the following

Running SignalAlign
[SignalAlignment.run] NOTICE: Creating forward and backward fasta files.
[multithread_signal_alignment_samples] Running SignalAlign on sample: min_example
[multithread_signal_alignment] indexing reference /fastdata/rrna_mod_detection/signalalign_output/tempFiles_alignment/forward.min_example.yeast_25S_18S.fa
[multithread_signal_alignment] indexing reference /fastdata/rrna_mod_detection/signalalign_output/tempFiles_alignment/backward.min_example.yeast_25S_18S.fa
[multithread_signal_alignment] running signal_alignment on 0 fast5s with 1 worker
Path to signalMachine does not exist and is not in PATH: /root/signalAlign/bin/signalMachine
[SignalAlignment.run] INFO: Starting on /fastdata/rrna_mod_detection/signalalign_output/split_fast5s/10/0000b1e5-68fa-4369-9e10-3d6c2c31fb2b.fast5
[NanoporeRead:open] is this an rna read?: True
Traceback (most recent call last):
  File "/opt/venv/bin/runSignalAlign.py", line 323, in <module>
    main()
  File "/opt/venv/bin/runSignalAlign.py", line 192, in main
    debug=config_args.debug)
  File "/opt/venv/lib/python3.7/site-packages/signalalign/signalAlignment.py", line 1045, in multithread_signal_alignment_samples
    sample.filter_read_generator)
  File "/opt/venv/lib/python3.7/site-packages/signalalign/signalAlignment.py", line 806, in multithread_signal_alignment
    success = alignment.run()
  File "/opt/venv/lib/python3.7/site-packages/signalalign/signalAlignment.py", line 215, in run
    overwrite=self.overwrite, debug=self.debug)
  File "/opt/venv/lib/python3.7/site-packages/signalalign/nanoporeRead.py", line 99, in __init__
    self.Initialize()
  File "/opt/venv/lib/python3.7/site-packages/signalalign/nanoporeRead.py", line 147, in Initialize
    ok &= self._initialize()
  File "/opt/venv/lib/python3.7/site-packages/signalalign/nanoporeRead.py", line 193, in _initialize
    oned_root_address = self.generate_new_event_table()
  File "/opt/venv/lib/python3.7/site-packages/signalalign/nanoporeRead.py", line 288, in generate_new_event_table
    rna=self.rna, overwrite=self.overwrite)
  File "/opt/venv/lib/python3.7/site-packages/signalalign/event_detection.py", line 245, in load_from_raw2
    "path_to_bin must exist"
AssertionError: path_to_bin must exist
Traceback (most recent call last):
  File "/opt/venv/bin/inference_pipeline.py", line 240, in <module>
    ret, time = (time_it(main))
  File "/opt/venv/lib/python3.7/site-packages/py3helpers/utils.py", line 169, in time_it
    something = func(*args)
  File "/opt/venv/bin/inference_pipeline.py", line 230, in main
    check_call(f"runSignalAlign.py run --config {config_path}".split())
  File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['runSignalAlign.py', 'run', '--config', './signalalign_output/sa_run_config.json']' returned non-zero exit status 1.

Thanks in advance! :slightly_smiling_face:

adbailey4 commented 1 year ago

This looks like a pathfinding issue... I haven't used apptainer before so I don't know how it might change paths compared to a docker container.

I can take a closer look on Friday to see if I can figure out what might be the problem. Thanks for your interest in this project!

wcvchan commented 1 year ago

I seem to have solve the problem of root access in the container but the pipeline still hasn't gone to completion.

I made a copy of the container with the --sandbox option then launched shell with --fakeroot. I can now see access /root within the container and see /root/rrna_scripts and /root/signalAlign.

apptainer build --sandbox copy_yeast_rrna.sif yeast_rrna_latest.sif
apptainer shell --fakeroot copy_yeast_rrna.sif/

I then ran inference_pipeline.py again with --debug

inference_pipeline.py --fastq 20210415_R941_mutant451.fastq --fast5 20210415_R941_mutant451/20210415_0552_MN20528_AGG125_7a2113f4/fast5/ --/Shared/Common/Reference/genomes/index/yeast_25S_18S.fa --path_to_bin /root/signalAlign/bin/ --output_dir . --threads 2 --name min_data --debug

The pipeline seems to be running fine but it was then terminated

[SignalAlignment.run] INFO: Starting on /fastdata/rrna_mod_detection/signalalign_output/split_fast5s/8/03176282-a92e-4fd4-8b4c-cdb2e4b60cd2.fast5
[NanoporeRead:open] is this an rna read?: True
KEY:PASSED: 03176282-a92e-4fd4-8b4c-cdb2e4b60cd2
[NanoporeRead:generate_new_event_table] INFO generated event table at /Analyses/Basecall_1D_000
[NanoporeRead._initialize] oned_root_address /Analyses/Basecall_1D_000
[SignalAlignment.run] NOTICE: template model /fastdata/rrna_mod_detection/signalalign_output/yeast_rrna_ivt_wt_trained_071521.model complement model None
[SignalAlignment.run] running command: /root/signalAlign/bin/signalMachine  -s 0  -q /fastdata/rrna_mod_detection/signalalign_output/tempFiles_alignment/min_data/tempFiles_03176282-a92e-4fd4-8b4c-cdb2e4b60cd2/temp_03176282-a92e-4fd4-8b4c-cdb2e4b60cd2.npRead -T /fastdata/rrna_mod_detection/signalalign_output/yeast_rrna_ivt_wt_trained_071521.model -D 0.1  -p /fastdata/rrna_mod_detection/signalalign_output/tempFiles_alignment/min_data/tempFiles_03176282-a92e-4fd4-8b4c-cdb2e4b60cd2/temp_cigar_03176282-a92e-4fd4-8b4c-cdb2e4b60cd2.txt -u /fastdata/rrna_mod_detection/signalalign_output/tempFiles_alignment/min_data/03176282-a92e-4fd4-8b4c-cdb2e4b60cd2.sm.forward.tsv -L 03176282-a92e-4fd4-8b4c-cdb2e4b60cd2 -n RDN18-1 -f /fastdata/rrna_mod_detection/signalalign_output/tempFiles_alignment/forward.min_data.yeast_25S_18S.fa  -b /fastdata/rrna_mod_detection/signalalign_output/tempFiles_alignment/backward.min_data.yeast_25S_18S.fa  -g 150 --rna -a ./signalalign_output/small_variants.model
[SignalAlignment.run]    03176282-a92e-4fd4-8b4c-cdb2e4b60cd2: [signalMachine]NOTICE: Using guide alignments from /fastdata/rrna_mod_detection/signalalign_output/tempFiles_alignment/min_data/tempFiles_03176282-a92e-4fd4-8b4c-cdb2e4b60cd2/temp_cigar_03176282-a92e-4fd4-8b4c-cdb2e4b60cd2.txt
[SignalAlignment.run]    03176282-a92e-4fd4-8b4c-cdb2e4b60cd2: signalAlign - starting template alignment
[SignalAlignment.run]    03176282-a92e-4fd4-8b4c-cdb2e4b60cd2: signalAlign - SUCCESS: finished alignment of query 03176282-a92e-4fd4-8b4c-cdb2e4b60cd2, exiting
[SignalAlignment.run]    03176282-a92e-4fd4-8b4c-cdb2e4b60cd2: 03176282-a92e-4fd4-8b4c-cdb2e4b60cd2 183 3220(49.913035)
[SignalAlignment.run] INFO: Starting on /fastdata/rrna_mod_detection/signalalign_output/split_fast5s/6/031b07da-a8d3-4fa3-bcaa-717867ffc936.fast5
[NanoporeRead:open] is this an rna read?: True
KEY:PASSED: 031b07da-a8d3-4fa3-bcaa-717867ffc936
[NanoporeRead:generate_new_event_table] INFO generated event table at /Analyses/Basecall_1D_000
[NanoporeRead._initialize] oned_root_address /Analyses/Basecall_1D_000
Traceback (most recent call last):
  File "/opt/venv/bin/runSignalAlign.py", line 323, in <module>
    main()
  File "/opt/venv/bin/runSignalAlign.py", line 192, in main
    debug=config_args.debug)
  File "/opt/venv/lib/python3.7/site-packages/signalalign/signalAlignment.py", line 1045, in multithread_signal_alignment_samples
    sample.filter_read_generator)
  File "/opt/venv/lib/python3.7/site-packages/signalalign/signalAlignment.py", line 806, in multithread_signal_alignment
    success = alignment.run()
  File "/opt/venv/lib/python3.7/site-packages/signalalign/signalAlignment.py", line 252, in run
    ok = npRead.Write(out_file=fH)
  File "/opt/venv/lib/python3.7/site-packages/signalalign/nanoporeRead.py", line 445, in Write
    ok = self.assert_events_and_event_map()
  File "/opt/venv/lib/python3.7/site-packages/signalalign/nanoporeRead.py", line 433, in assert_events_and_event_map
    oneD_event_map_check = self.init_event_map()
  File "/opt/venv/lib/python3.7/site-packages/signalalign/nanoporeRead.py", line 344, in init_event_map
    len(self.template_strand_event_map)))
  File "/opt/venv/lib/python3.7/site-packages/signalalign/nanoporeRead.py", line 107, in check
    assert statement, f"KEY:FAILED:{self.read_label}: {message}"
AssertionError: KEY:FAILED:031b07da-a8d3-4fa3-bcaa-717867ffc936: Read and event map lengths do not match 1863 != 1862
Traceback (most recent call last):
  File "/opt/venv/bin/inference_pipeline.py", line 240, in <module>
    ret, time = (time_it(main))
  File "/opt/venv/lib/python3.7/site-packages/py3helpers/utils.py", line 169, in time_it
    something = func(*args)
  File "/opt/venv/bin/inference_pipeline.py", line 230, in main
    check_call(f"runSignalAlign.py run --config {config_path}".split())
  File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['runSignalAlign.py', 'run', '--config', './signalalign_output/sa_run_config.json']' returned non-zero exit status 1.
adbailey4 commented 1 year ago

Ah, ok so try again without the debug flag. That error AssertionError: KEY:FAILED:031b07da-a8d3-4fa3-bcaa-717867ffc936: Read and event map lengths do not match 1863 != 1862 happens periodically and when debug is turned off, multiprocessing is turned on and those errors just get ignored.

wcvchan commented 1 year ago

yep all good now thank you!