Open Simarpreet-Kaur-Bhurji opened 3 months ago
Hi @Simarpreet-Kaur-Bhurji,
looks like this happened during the communication between different threads. not sure what goes on there exactly. could you share with us the whole work folder of that failing step (/hps/nobackup/flicek/ensembl/compara/sbhurji/Development/fastoma_run/work/18/6d8a8694445b6226830f618af3bf2f), including the data for the roothog D0138574. Probably something like this should work: cd /hps/nobackup/flicek/ensembl/compara/sbhurji/Development/fastoma_run/work/18/6d8a8694445b6226830f618af3bf2f; tar -cvzhf dump.tgz .
should work.
Hi Adrian, thank you for getting in touch. I kept the log message but I have unfortunately deleted the work directory in anticipation of the rerun. In the meanwhile I will rerun it and will let you know when I reach this issue again?
No worries.
For future it would be also helpful to know whether the task ran out of memory or not. I see a case where Segmentation fault
happened due to lack of enough memory. FastOMA by default retries three times increasing the allocated memory with slrum job.
To check the slrum job, you could see the relevant work
folder, and find its job name and job ID (e.g. with sacct
), which could be used with seff
to see whether it ran out of memory or not (please see end of this wiki for an example).
$ head -n2 .command.run
#!/bin/bash
#SBATCH -J nf-hog_rest_(25)
Thank you that is helpful, I will check it for this run.
Hi Sina, I have run the pipeline again, I hit the mafft segmentation fault and bumped the memory based on your previous suggestion. After that it seemed to be running for 2 days and now it has again failed with recursion limit reached error. Please find attached the work folder herewith. Let me know if you need any other details. Thank you. dump.tgz
Hi Simarpreet
The fix had been on another branch and I think you ran the same code. Adrian just updated the main branch. So the latest code shouldn't hit the recursion limit. In order to save time/computation, you can run only this rootHOG (using the .command.sh
) to see the problem is solved or not.
Hi Sina, I pulled the lastest changes and rerun but I still got the recursion limit reached error. Do you think it is to do with the data given Triticum aestivum is usually troublesome because of it's size? PFA the word dir herewith. wheat_roothog_dir.tgz
Hi @Simarpreet-Kaur-Bhurji ,
I've uploaded a fix for this issue (hopefully this time for real). you could try it by updating the repo with the dev
branch and submitting the .command.run from the failing work-directory. if you use containers, you should ensure that the dessimozlab/fastoma:sha-1aa97b8
e.g. docker pull dessimozlab/fastoma:sha-1aa97b8
is used. please let us know if this fixes your issue.
Hey Adrian, thank you for looking into this. At the moment our servers are under scheduled maintenance I will let you know if this fixes it. Thank you. Would request to keep this issue open until then.
Btw, if you share the fasta file of rootHOG ( inside the folder fastoma_run/work/30/45bab08427770d06e1b9e5f1f5d282/rhogs_big/58
) with us, I can run on it and make sure the issue is resolved.
Hi Sina, sure thing thank you for helping with this. PFA the fasta file herewith. Just to let you know I have also rerun the pipeline at my end but it will be a while until it reaches that step, so it will be great if you could check if the issue is resolved. HOG_D0138574.fa.gz
Thanks. Yes. It finished successfully in our cluster. Hope it will be smooth in your side.
2024-08-29 04:25:47 DEBUG Inferring subHOGs for batch of 1 rootHOGs started.
2024-08-29 04:25:48 INFO number of proteins in the rHOG is 20269.
2024-08-29 04:25:48 INFO Number of unique species in rHOG D0138574 is 18.
...
2024-08-29 04:41:26 INFO All subHOGs for the rootHOG D0138574 as OrthoXML format is written in pickle_hogs/file_D0138574.pickle
Thank you so much for testing this on your side, will let you know how the run goes for us, finger crossed.
Hi Sina and Adrian, sorry it has taken a while for me to get back to you. So as it stands the run was still not complete on my end. When I got the segmentation fault I tried to up the memory by updating it to the following in FastOMA.nf file:
` memory { mem_cat(getMaxFileSize(rhogsbig), nr_species as int) task.attempt 3 }
After which I again got the segmentation fault error but with maxwm <- 0.0
the error and log files are attached herewith.
command.log.txt command.err.txt
The fasta file: HOG_D0138736.fa.txt
The size of the zipped folder is more than the allowed size for git I can send that via email. I also tried the sacct but I do not have the job id of the affected job anymore because it has been a while since I last run. Please let me know what you think is going on here.
Hello. I am having a RecursionError: maximum recursion depth exceeded error as well.
I am running 0.3.4. I am only running with 15 species.
I am pasting the last good line and the first error line from my .nextflow.log. I am also attaching a screenshot of my summary report. I am lost as to what I should do next to trouble shoot this issue.
Thank you, Sofia
from .netflow.log:
Sep-19 20:50:47.487 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 2104348; id: 20; name: infer_roothogs (1); status: COMPLETED; exit: 1; error: -; workDir: /n/sci/SCI-004219-SBCHAMELEO/Chamaeleo_calyptratus/genomes/CCA3-haplotypes/analysis/gene_gain_loss/fastoma/work/85/169b353adc16f9830a97bcb887204c started: 1726787282133; exited: 2024-09-20T01:49:51.671447Z; ]
Sep-19 20:50:47.487 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=infer_roothogs (1); work-dir=/n/sci/SCI-004219-SBCHAMELEO/Chamaeleo_calyptratus/genomes/CCA3-haplotypes/analysis/gene_gain_loss/fastoma/work/85/169b353adc16f9830a97bcb887204c
error [nextflow.exception.ProcessFailedException]: Process infer_roothogs (1)
terminated with an error exit status (1)
Sep-19 20:50:47.518 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'infer_roothogs (1)'
Caused by:
Process infer_roothogs (1)
terminated with an error exit status (1)
Command executed:
fastoma-infer-roothogs --proteomes proteome --hogmap hogmaps --splice splice --out-rhog-folder "omamer_rhogs" -vv
Command exit status: 1
Command output: 291057 83867 There are 83867 candidate pairs of rhogs for merging.
There are 4776 clusters.
Command error: ^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/FastOMA/_utils_roothog.py", line 1205, in HCS H = HCS(sub_graphs[0]) ^^^^^^^^^^^^^^^^^^ [Previous line repeated 4 more times] File "/app/lib/python3.11/site-packages/FastOMA/_utils_roothog.py", line 1198, in HCS E = nx.algorithms.connectivity.cuts.minimum_edge_cut(G) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<class 'networkx.utils.decorators.argmap'> compilation 4", line 3, in argmap_minimum_edge_cut_1 File "/app/lib/python3.11/site-packages/networkx/utils/backends.py", line 633, in call return self.orig_func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/networkx/algorithms/connectivity/cuts.py", line 607, in minimum_edge_cut this_cut = minimum_st_edge_cut(H, v, w, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<class 'networkx.utils.decorators.argmap'> compilation 30", line 3, in argmap_minimum_st_edge_cut_27 File "/app/lib/python3.11/site-packages/networkx/utils/backends.py", line 633, in call return self.orig_func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/networkx/algorithms/connectivity/cuts.py", line 150, in minimum_st_edge_cut cut_value, partition = nx.minimum_cut(H, s, t, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<class 'networkx.utils.decorators.argmap'> compilation 34", line 3, in argmap_minimum_cut_31 File "/app/lib/python3.11/site-packages/networkx/utils/backends.py", line 633, in call return self.orig_func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/networkx/algorithms/flow/maxflow.py", line 457, in minimum_cut non_reachable = set(dict(nx.shortest_path_length(R, target=_t))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<class 'networkx.utils.decorators.argmap'> compilation 42", line 3, in argmap_shortest_path_length_39 File "/app/lib/python3.11/site-packages/networkx/utils/backends.py", line 633, in call return self.orig_func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/networkx/algorithms/shortest_paths/generic.py", line 301, in shortest_path_length G = G.reverse(copy=False) ^^^^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/networkx/classes/digraph.py", line 1334, in reverse return nx.reverse_view(self) ^^^^^^^^^^^^^^^^^^^^^ File "<class 'networkx.utils.decorators.argmap'> compilation 46", line 4, in argmap_reverse_view_43 File "/app/lib/python3.11/site-packages/networkx/classes/graphviews.py", line 266, in reverse_view newG = generic_graph_view(G) ^^^^^^^^^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/networkx/classes/graphviews.py", line 104, in generic_graph_view newG = G.class() ^^^^^^^^^^^^^ File "/app/lib/python3.11/site-packages/networkx/classes/digraph.py", line 350, in init self._node = self.node_dict_factory() # dictionary for node attr ^^^^^^^^^^ RecursionError: maximum recursion depth exceeded
Hi @srobb1
Thanks for reaching out. We believe we fixed this issue by the update provided in the dev
branch (discussed on this page). Please let us know if it helps your case as well. Feel free to open a new github issue if the problem continues, and please provide us more info about the system you are using and the tree.
Best,
Sina
Hi @Simarpreet-Kaur-Bhurji
It looks like it is a different rootHOG. Could you possibly run the command in the .command.sh
(available inside the work folder) for this rootHOG and see how much memory it needs? (it would be best to copy the needed file in a new folder and run with slurm to have the full log). Btw, which MAFFt version are you using and how did you install it?
Yes, please send me the rootHOG, I could try it out too. We would love to arrange our next meeting, probably in mid October.
Best, Sina
Hi Sina, sure thing we will get in touch via email to schedule our next meeting. We can look into the above issues then?
Hello, While running FastOMA on 2200 species, I encountered another mafft segmentation fault but when I resumed nextflow it seemed to not complain about the segmentation fault but I got the recursion limit reached error. Please find attached the log file of the run herewith. Do you know what's going on?
recurrsion_depth_err.log