crimBubble / ECCsplorer

The ECCsplorer is a bioinformatics pipeline for the automated detection of extrachromosomal circular DNA (eccDNA) from paired-end read data of amplified circular DNA.
GNU General Public License v3.0
18 stars 5 forks source link

Error: LENGTH or similar applied to NULL object during "Normalizing coverage data and editing Rscript for visualization." During test data run. #15

Closed janprovaz closed 2 months ago

janprovaz commented 9 months ago

Dear @crimBubble, first of all thank you for writing this software :) I would like to ask you for help regarding running the test data.

When I run the script as instructed (with addition of -cpu 10 to prevent previously mentioned "index out of bounds" errors) I get this:

2023-12-14 14:29:05,531 - [mapper_coordinator] INFO: Coverage calculation and peak finding took 0.39s.
2023-12-14 14:29:05,531 - [mapper_coordinator] INFO: Extracting eccDNA candidate regions.
2023-12-14 14:29:05,613 - [mapper_coordinator] INFO: Extracting eccDNA candidate regions took 0.08s.
2023-12-14 14:29:05,613 - [mapper_coordinator] INFO: Normalizing coverage data and editing Rscript for visualization.
Error: LENGTH or similar applied to NULL object
In addition: Warning messages:
1: package ‘ggplot2’ was built under R version 4.2.3 
2: package ‘ggrepel’ was built under R version 4.2.3 
3: package ‘gridExtra’ was built under R version 4.2.3 
4: package ‘dplyr’ was built under R version 4.2.3 
Fatal error: unable to initialize the JIT

/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rparser.py:520: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if isinstance(data, (numpy.float, numpy.float64)):
/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rparser.py:523: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif isinstance(data, (numpy.int, numpy.int32, numpy.int64)):
/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rparser.py:526: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif isinstance(data, (numpy.complex, numpy.complex64,
/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rparser.py:533: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif isinstance(data, (numpy.bool_, numpy.bool, numpy.bool8)):
/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rserializer.py:328: DeprecationWarning: `np.long` is a deprecated alias for `np.compat.long`. To silence this warning, use `np.compat.long` by itself. In the likely event your code does not need to work on Python 2 you can use the builtin `int` for which `np.compat.long` is itself an alias. Doing this will not modify any behaviour and is safe. When replacing `np.long`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if o.dtype in (numpy.int64, numpy.long):
/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rserializer.py:345: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
  self._buffer.write(o.tostring(order='F'))
Traceback (most recent call last):
  File "ECCsplorer.py", line 815, in <module>
    main()
  File "ECCsplorer.py", line 775, in main
    sum_mapper_win_coverage, sum_mapper_candidate_fas, analysis_errors = obj_mapper.mapper_coordinator()
  File "/g/gcbio/provaznik/eccDNA/ECCsplorer/lib/eccMapper.py", line 785, in mapper_coordinator
    self.conn.r.convgenome(self.csv_sum_raw, self.maps_basecnt, self.csv_sum_nrm)
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rconn.py", line 369, in __call__
    return self._rconn.callFunc(self.__name__, *args, **kw)
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rconn.py", line 78, in decoCheckIfClosed
    return func(self, *args, **kw)
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rconn.py", line 270, in callFunc
    return self.eval(name+'(%s)' % ', '.join(argNames))
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rconn.py", line 78, in decoCheckIfClosed
    return func(self, *args, **kw)
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rconn.py", line 170, in eval
    message = rparse(src, atomicArray=atomicArray)
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rparser.py", line 646, in rparse
    return rparser.parse()
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rparser.py", line 439, in parse
    self.lexer.readHeader()
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rparser.py", line 154, in readHeader
    command = Command(struct.unpack('<I', self.read(4))[0])
  File "/g/gcbio/provaznik/eccDNA/mamba/lib/python3.7/site-packages/pyRserve/rparser.py", line 230, in read
    raise EndOfDataError()
pyRserve.rexceptions.EndOfDataError
2023-12-14 14:29:05,874 - [r_shutdown] INFO: Shutting down Rserve.
2023-12-14 14:29:05,874 - [exit_err] ERROR: Sorry, something went wrong.

This happens both on my test and my real data, I tried clean output folders. Are there specific versions of numpy and pyRserve that I should be running instead?

Thank you very much for your help and time, Jan

crimBubble commented 9 months ago

Hi Jan, it seems to be an issue with the R version and pyRserve. Can you please check the R version installed in your environmnet? Try to downgrade it to version 4.0.3 (this is the version we are currently running on our machines) and re-install the needed R packages (you find them in the environment.yml, all starting with r-*). This hopfully should solve the issue.

Best, Ludwig

lxs524 commented 1 month ago

I have solved your problem, reducing the R version and reinstalling pyRserve can solve it, but now I have a new problem, could you help me solve it? Thank you very much。。。。。。。I think the main problem is mgblast. Why can't we find mgblast

024-08-25 20:08:21,514 - lib.seqtools - INFO - saving chunk /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.4

2024-08-25 20:08:21,514 - lib.seqtools - INFO - running all to all blast

Process 0: Traceback (most recent call last): File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, self._kwargs) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/parallel/parallel.py", line 79, in fun pipe.send(f(x)) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/parallel/parallel.py", line 294, in command_star return(command(args)) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/seqtools.py", line 34, in _hitsort_worker with subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE) as p: File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 800, in init restore_signals, start_new_session) File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'mgblast': 'mgblast' Process 1: Traceback (most recent call last): File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(self._args, self._kwargs) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/parallel/parallel.py", line 79, in fun pipe.send(f(x)) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/parallel/parallel.py", line 294, in command_star return(command(args)) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/seqtools.py", line 34, in _hitsort_worker with subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE) as p: File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 800, in init restore_signals, start_new_session) File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'mgblast': 'mgblast' Process 3: Process 2: Traceback (most recent call last): File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(self._args, self._kwargs) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/parallel/parallel.py", line 79, in fun pipe.send(f(x)) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/parallel/parallel.py", line 294, in command_star return(command(args)) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/seqtools.py", line 34, in _hitsort_worker with subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE) as p: File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 800, in init restore_signals, start_new_session) File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'mgblast': 'mgblast' Traceback (most recent call last): File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(self._args, self._kwargs) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/parallel/parallel.py", line 79, in fun pipe.send(f(x)) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/parallel/parallel.py", line 294, in command_star return(command(*args)) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/seqtools.py", line 34, in _hitsort_worker with subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE) as p: File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 800, in init restore_signals, start_new_session) File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'mgblast': 'mgblast'

2024-08-25 20:08:21,531 - lib.seqtools - INFO - all to all blast finished

2024-08-25 20:08:21,531 - lib.seqtools - INFO - removing duplicates from all to all blast results

2024-08-25 20:08:21,538 - lib.graphtools - INFO - converting hitsort to binary format

2024-08-25 20:08:21,544 - lib.graphtools - INFO - running louvain clustering...

Traceback (most recent call last): File "/home/lxs/miniconda3/envs/eccsplorer/bin/seqclust", line 821, in main() File "/home/lxs/miniconda3/envs/eccsplorer/bin/seqclust", line 656, in main run_info = DataInfo(args, paths) File "/home/lxs/miniconda3/envs/eccsplorer/bin/seqclust", line 243, in init self._prerun(sample, paths) File "/home/lxs/miniconda3/envs/eccsplorer/bin/seqclust", line 257, in _prerun sample_hitsort.louvain_clustering(merge_threshold=0.2) File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/graphtools.py", line 325, in louvain_clustering timeout=None) File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['louvain_community', '/home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.blast.int.bin', '-l', '-1', '-w', '/home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.blast.int.weight', '-v ', '-s', '123']' died with <Signals.SIGSEGV: 11>. Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/home/lxs/miniconda3/envs/eccsplorer/bin/repex_tarean/lib/r2py.py", line 11, in shutdown conn = pyRserve.connect(port=port) File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/site-packages/pyRserve/rconn.py", line 70, in connect return RConnector(host, port, atomicArray, defaultVoid, oobCallback) File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/site-packages/pyRserve/rconn.py", line 103, in init self.connect() File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/site-packages/pyRserve/rconn.py", line 122, in connect hdr = self.sock.recv(1024) ConnectionResetError: [Errno 104] Connection reset by peer 2024-08-25 20:08:23,063 - [cluster_coordinator] INFO: Rserv started in daemon mode.

Building a new DB, current time: 08/25/2024 20:08:21 New DB name: /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta New DB title: /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 4000 sequences in 0.036247 seconds.

Building a new DB, current time: 08/25/2024 20:08:21 New DB name: /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta New DB title: /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 4000 sequences in 0.036865 seconds. Trying to start Rserve... connection OK R function loaded: add_preamble capitalize connect_to_databases create_main_reports df2html disconnect_database dummy_function get_comparative_codes is_comparative nested2named_list plot_rect_map preformatted rectMap reformat_df_report reformat_df_to_profrep_classification reformat_header reformat4html start_html summary_histogram R function loaded: add_leaves_value add_preamble add_value_to_nodes annot2colors cluster_annotation common_ancestor connect_to_databases containLTR create_all_superclusters_report create_cluster_report create_single_supercluster_report df2html disconnect_database evaluate_LTR_detection filter_tree filter_tree2 find_best_hit find_best_hit_repeat format_clinfo format_tree formatWidth get_annotation_groups get_cluster_annotation_summary get_cluster_comparative_counts get_cluster_connection_info get_cluster_info get_comparative_codes get_ltr_info get_reads_annotation get_supercluster_graph get_supercluster_info get_supercluster_summary get_tarean_info html_insert_floating_image html_insert_image is_comparative make_final_annotation_template nested2named_list pasteDomains pieScatter plot_edges plot_rect_map plot_supercluster plotg preformatted radius_size read_annotation_to_tree rectMap rescale select_reads_id start_html summarize_annotation summary_histogram supercluster_size trmap running in parallel using 16 cpu(s) mgblast -p 75 -W18 -UT -X40 -KT -JF -F "m D" -v100000000 -b100000000 -D4 -C 30 -H 30 -i /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.0 -d /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.legacy mgblast -p 75 -W18 -UT -X40 -KT -JF -F "m D" -v100000000 -b100000000 -D4 -C 30 -H 30 -i /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.1 -d /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.legacy mgblast -p 75 -W18 -UT -X40 -KT -JF -F "m D" -v100000000 -b100000000 -D4 -C 30 -H 30 -i /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.3 -d /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.legacy mgblast -p 75 -W18 -UT -X40 -KT -JF -F "m D" -v100000000 -b100000000 -D4 -C 30 -H 30 -i /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.2 -d /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.legacy job finished with exit code 1 job finished with exit code 1 job finished with exit code 1 job finished with exit code 1 ['louvain_convert', '-i', '/home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.blast.int', '-o', '/home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.blast.int.bin', '-w', '/home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/seqclust/prerun/sample.fasta.blast.int.weight'] Shutting down Rserv...Done

2024-08-25 20:08:23,063 - [cluster_coordinator] INFO: Summarizing clustering results. sed:无法读取 /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/testrun/eccpipe_results/clustering_results/CLUSTERTABLE.csv:没有那个文件或目录 /home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/lib/eccClusterer.py:55: DeprecationWarning: np.str is a deprecated alias for the builtin str. To silence this warning, use str by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations dtype=np.str, delimiter='\t', skiprows=1) Traceback (most recent call last): File "/home/lxs/miniconda3/envs/eccsplorer/bin/eccsplorer", line 815, in main() File "/home/lxs/miniconda3/envs/eccsplorer/bin/eccsplorer", line 785, in main obj_clusterer.cluster_coordinator() File "/home/lxs/miniconda3/envs/eccsplorer/bin/ECCsplorer/lib/eccClusterer.py", line 55, in cluster_coordinator dtype=np.str, delimiter='\t', skiprows=1) File "/home/lxs/miniconda3/envs/eccsplorer/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1091, in loadtxt next(fh) StopIteration 2024-08-25 20:08:23,268 - [r_shutdown] INFO: Shutting down Rserve. 2024-08-25 20:08:23,269 - [exit_err] ERROR: Sorry, something went wrong.

crimBubble commented 1 month ago

Dear @lxs524 , it seems you are using two conda environments for RE2 and eccsplorer. Make sure that mgblast is also available within the eccsplorer environment. Follow the detailed installation instructions to install all RE2 dependencies within the eccsplorer environment.

Note that in its current implementation the eccsplorer pipeline is not meant to activate other conda environments.