chrisquince / STRONG

Strain Resolution ON Graphs
MIT License
44 stars 9 forks source link

problem on SPAdes installation #113

Open yuxiangtan opened 2 years ago

yuxiangtan commented 2 years ago

Dear author,

I was not able to install STRONG from the auto installer. I followed the manual installation step by step, but still failed on SPAdes ./build_cog_tools.sh . I install the SPAdes from conda, but in the dependency checking it looked at SPAdes/assembler/build_spades/bin/unitig-coverage, which is not part of the SPAdes package.

I have no idea what I can do now and I need you help on installation.

One suggestion, if you could provide a docker image which will be much easier for future users.

Thank you,

Yuxiang

Sebastien-Raguideau commented 2 years ago

Dear Yuxiang,

We use a particular version of SPAdes with tools allowing us to get high resolution assembly graph. That version of SPAdes is present in the repo and nothing will work well if you try to install some other SPAdes from, for instance, conda.

The auto installer is just a small script which execute all the step from the "manual" installation. Hence, if the auto install doesn't work it's likely the manual one won't.

When you say the auto install didn't work, can you expand on that? Can you share with me the corresponding error message so that I can help solve your issue?

Best, Seb

yuxiangtan commented 2 years ago

Hi Sebastien,

Thank you for your reply. Attached are the logs. I could not figure our why it failed. That is why I install from conda, which as you said is not appropriate.

Honestly, since there are fixed version software, will it be possible to provide a docker image that everyone could easily use and avoid this kind of installation issues?

Best,

Yuxiang

build_cog_tools.log install_strong.log

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang,

I can see an issue in the compilation log, it seems like even though the zlib library is installed, you are missing the zlib.h header file. Unsure about what system you're working on but this would be fixed on ubuntu with: sudo apt-get -y install libbz2-dev libreadline-dev cmake g++ zlib1g zlib1g-dev

Also, I can see that the path to zlib is : /home/tanyuxiang/.conda/envs/STRONG/lib/libz.so It looks like it is inside of the conda env. I assume that you installed libz in conda only? Could you try having libbz2-dev libreadline-dev cmake g++ zlib1g zlib1g-dev installed system wide.

Then you would need to retry that step:

cd <path/to/strong>/STRONG/SPAdes/assembler
./build_cog_tools.sh 

I didn't answer you on the docker image the first time, for the simple reason that I don't know how to do that right now and this is not the most urgent thing I need to focus on at the moment. If we can't solve your issue, I may try and find time to do that.

Best, Seb

yuxiangtan commented 2 years ago

run.log

Hi Sebastien,

Thanks for all your reply and I found building a docker will be much easier way for me to solve it. Now, I build a docker and pass all the check (SnakeNest/scripts/check_on_dependencies.py)

However, when I run your test data, I got two type of errors:

  1. when I unzip the Test.tar.gz, it is truncated (I downloaded twice but all the same situation)
  2. Although the dry run is same as the post with the Step #3 - Strain Decomposition error, I got more errors when I run the actual pipeline (possibly related to the error 1). The log attached.

So, how could I make sure the docker is correct?

Once it is fully checked, I can upload the docker to dockerhub and share it will you.

Best,

Yuxiang

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang

It's nice you managed to build a docker image which pass the checks.

  1. Hum, strange, I can download/unzip perfectly fine. Either an issue with downloading or with unziping? Here is the md5, can you check yours? I hope the issue is on unzipping, because you can try doing it differently. If you can't manage to download the file, I'm unsure what could be done. 3704340ce2ea5a268dae4cf1a6eaa171 Test.tar.gz

  2. yes, dryrun failing is normal, it is a sequential process and snakemake cannot anticipate what to do without previous steps output. Regarding spades, I had a look at the logs, and I can see that spades is failing though it is unclear why. Could you share/have a look at the file: assembly/spades.log. That's the log of spades and it will tell you why it's failing. If you do have truncated files, it's possible spades stop himself from difference in reads numbers between R1 and R2 files, or just malformed entries. So then it's just a matter of getting correct data.

Well, if when creating a docker image your installation went without issue and the check tells you it's working, then I would say it is correct and works. 100% confidence would be achieved by being able to run 1 of the test dataset.

Thanks for that, that would extremely helpful.

Best, Seb

yuxiangtan commented 2 years ago

Hi, Seb,

Yes, I created a docker image without issue and the check told me it's working. However, even if I checked the downloaded Test.tar.gz and checked the md5 and it is not truncated this time, the docker run still ended with an error. (attached) run.log

I push the docker to docker hub already: https://hub.docker.com/r/yuxiangtan/strong, maybe you can test is yourself as well. (The COG database is not within, users just need to point the path in the config.yaml as you mentioned in the github)

Best,

Yuxiang

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang,

Cool, nice it worked. Ok, the log tell me it fails at creating a fig with an R script, though real log related to that script are for instance at: results/Bin_2/tmp/haplotypes_tree.log Can you tell me what's inside? Most likely a missing library, more troublesome would be an error message from lib version not being the right ones.

Thanks a lot for that, I will have a go at testing it and documenting how to use it.

Best, Seb

yuxiangtan commented 2 years ago

Following are the error message, looks like a more troublesome one.

Warning message: package 'tidyr' was built under R version 4.0.3 Warning message: package 'stringr' was built under R version 4.0.5 Registered S3 method overwritten by 'treeio': method from root.phylo ape ggtree v2.4.1 For help: https://yulab-smu.top/treedata-book/

If you use ggtree in published research, please cite the most appropriate paper(s):

ESC[36m-ESC[39m Guangchuang Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics, 2020, 69:e96. doi:10.1002/cpbi.96 ESC[36m-ESC[39m Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution 2018, 35(12):3041-3043. doi:10.1093/molbev/msy194 ESC[36m-ESC[39m Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 2017, 8(1):28-36. doi:10.1111/2041-210X.12628

Attaching package: 'ggtree'

The following object is masked from 'package:tidyr':

expand

Warning message: package 'ggtree' was built under R version 4.0.3 Warning message: package 'ggplot2' was built under R version 4.0.5 Warning message: package 'wesanderson' was built under R version 4.0.5 Error in DataMask$new(.data, caller_env) : argument "caller_env" is missing, with no default Calls: ggtree ... mutate.data.frame -> mutate_cols -> -> initialize Execution halted

haplotypes_tree.log (END)

yuxiangtan commented 2 years ago

Hi, Seb

I run the test data for more than two week already, but it is still running and I found there was no resume option. Also, I check the server is still running the job but using very little resources.

I am not sure what's happening now and what should I do. BTW, I need to restart the server in a few days.

Thank you!

Following is the log.

Using shell: /bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count min threads max threads


all 1 1 1 bed_orfs 1 1 1 cat_split_annotation 1 1 1 compute_avg_cov 1 1 1 concoct 1 1 1 copy_fasta 1 1 1 coverage 1 1 1 create_bin_folders 1 1 1 cut_contigs 1 1 1 extract_SCG_sequences 1 1 1 get_SCG_tables 1 1 1 initial_quantity_of_bins 1 1 1 merge_contigs 1 1 1 parse_cogs_annotation 1 1 1 prodigal 1 1 1 reads_yaml 1 1 1 refine 1 1 1 rpsblast 100 1 1 simplify 1 1 1 spades 1 1 1 split_fasta 1 1 1 unitig_profiles 1 1 1 total 121 1 1

Select jobs to execute...

[Mon Sep 20 03:59:45 2021] rule reads_yaml: output: samples.yaml jobid: 5 resources: tmpdir=/tmp

[Mon Sep 20 03:59:46 2021] Finished job 5. 1 of 121 steps (1%) done Select jobs to execute...

[Mon Sep 20 03:59:46 2021] rule coverage: output: profile/split/coverage.tsv jobid: 1 resources: tmpdir=/tmp

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang, From the logs it seems you've been running with only 1 core, that's something you can change. Then it would be faster :) There is a resume function. This pipeline is written with snakemake, a workflow management system, which check what's left to run and schedule tasks automatically. So, what's happening exactly? You are relaunching STRONG with the same exact config file and it restart from scratch? That should not happen.... You can try using the option -s -t, which asks snakemake to go through the whole workflow and update timestamps. Since snakemake also check timestamp consistency when deciding what needs to be rerun. Example: file needed for a task is more recent than output of the task, then the task need to be rerun. I am still looking at that issue with R library and will answer you shortly. Best, Seb

yuxiangtan commented 2 years ago

Hi Seb,

OK, I thought I set the core number in the config file. I tried 30 core again, and it finished the spade step now. However, it broke at step 1, which I have no idea what I could do.

Following is the key error message and the full log file attached. Step #1 - Assembly / Binning / COG Annotation Traceback (most recent call last): File "/STRONG/bin//STRONG", line 94, in call_snake(["--snakefile", "SnakeNest/SCogSubGraph.snake"]) File "/STRONG/bin//STRONG", line 83, in call_snake subprocess.check_call(base_params + extra_params, stdout=sys.stdout, stderr=sys.stderr) File "/root/miniconda3/envs/STRONG/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['snakemake', '--directory', '/data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2', '--cores', '30', '--config', 'LOCAL_DIR=/STRONG', '--configfile=/data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2/config.yaml', '--latency-wait', '120', '-k', '--snakefile', 'SnakeNest/SCogSubGraph.snake']' returned non-zero exit status 1.

run.log

Best,

Yuxiang

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang, Well, there is a setting for core number in the config file, but it's "per task" core number. If some task like mapping can natively use multiple core, you would allow it to run with that much cores. This was an issue I fixed not too long ago, something like 1 or 2 commit ago. Can you pull the latest version of STRONG from github? That would fix it. Best, Seb

yuxiangtan commented 2 years ago

Hi Seb,

I git pull and rerun but still get the similar error message:

Git pull: (base) root@e710c85a387c:/STRONG# git pull remote: Enumerating objects: 9, done. remote: Counting objects: 100% (9/9), done. remote: Compressing objects: 100% (4/4), done. remote: Total 9 (delta 5), reused 6 (delta 5), pack-reused 0 Unpacking objects: 100% (9/9), 2.16 KiB | 245.00 KiB/s, done. From https://github.com/chrisquince/STRONG f7f62a8..dd77532 master -> origin/master db86a80..60415dd metabat2 -> origin/metabat2 Updating f7f62a8..dd77532 Fast-forward SnakeNest/Common.snake | 2 +- SnakeNest/Desman.snake | 11 +---------- SnakeNest/Results.snake | 1 - 3 files changed, 2 insertions(+), 12 deletions(-)

the log of rerun: [Wed Oct 20 14:01:11 2021] Job 5: Building bowtie index for profile/assembly.fasta

[Wed Oct 20 14:01:43 2021] Error in rule bowtie_index: jobid: 5 output: profile/assembly/index.done log: profile/assembly/index.log (check log file(s) for error message) shell: bowtie2-build profile/assembly.fasta profile/assembly/index --threads 30 &> profile/assembly/index.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs. Exiting because a job execution failed. Look above for error message Complete log: /data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2/.snakemake/log/2021-10-20T140109.688022.snakemake.log Output folder set to /data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2 Step #1 - Assembly / Binning / COG Annotation Traceback (most recent call last): File "/STRONG/bin//STRONG", line 94, in call_snake(["--snakefile", "SnakeNest/SCogSubGraph.snake"]) File "/STRONG/bin//STRONG", line 83, in call_snake subprocess.check_call(base_params + extra_params, stdout=sys.stdout, stderr=sys.stderr) File "/root/miniconda3/envs/STRONG/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['snakemake', '--directory', '/data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2', '--cores', '30', '--config', 'LOCAL_DIR=/STRONG', '--configfile=/data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2/config.yaml', '--latency-wait', '120', '-k', '--snakefile', 'SnakeNest/SCogSubGraph.snake']' returned non-zero exit status 1.

run.log Am I still missing something?

Best ,

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang, Can you share the file at : profile/assembly/index.log That would be where the error message would be stored. Best, Seb

yuxiangtan commented 2 years ago

index.log

Here it is.

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang, There was an issue with bowtie index, it was unable to deal with the size of your assembly. Strange since we did use it on more than 4G assemblies before. Maybe a version thing. I fixed that issue by adding the argument proposed in those log. You just need to pull the latest version of STRONG and that would work. Best, Seb

yuxiangtan commented 2 years ago

Hi Seb,

Thank you for the last update. It fixed the previous problem.

However, new errors came out:

Exiting because a job execution failed. Look above for error message Complete log: /data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2/.snakemake/log/2021-10-22T140413.432890.snakemake.log Output folder set to /data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2 Step #1 - Assembly / Binning / COG Annotation Traceback (most recent call last): File "/STRONG/bin//STRONG", line 94, in call_snake(["--snakefile", "SnakeNest/SCogSubGraph.snake"]) File "/STRONG/bin//STRONG", line 83, in call_snake subprocess.check_call(base_params + extra_params, stdout=sys.stdout, stderr=sys.stderr) File "/root/miniconda3/envs/STRONG/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['snakemake', '--directory', '/data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2', '--cores', '30', '--config', 'LOCAL_DIR=/STRONG', '--configfile=/data/Xianjinyuan/tanyuxiang/FMT_mouse-WY/STRONG_all/rerun2/config.yaml', '--latency-wait', '120', '-k', '--snakefile', 'SnakeNest/SCogSubGraph.snake']' returned non-zero exit status 1.

An example of the error: /bin/bash: line 1: 18843 Killed bedtools coverage -a profile/split.bed -b profile/assembly/sample42.sorted.bam -mean > profile/split/sample42.cov 2> profile/split/sample42.log [Sat Oct 23 03:30:41 2021] Error in rule bedtools_split_cov: jobid: 115 output: profile/split/sample42.cov log: profile/split/sample42.log (check log file(s) for error message) shell: bedtools coverage -a profile/split.bed -b profile/assembly/sample42.sorted.bam -mean > profile/split/sample42.cov 2>profile/split/sample42.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job bedtools_split_cov since they might be corrupted: profile/split/sample42.cov Job failed, going on with independent jobs.

Attached the full log: run.log

Best,

Yuxiang

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang, Can you share one of the corresponding log, for instance profile/split/sample42.log. Or is it the example of error you wrote? In your error example, It's written it's been killed, could it be that you ran out of ram? That step is ram consuming and multiple instance can overtake your ram. There is a way to fix that ram overconsumption but I need to do some tests. You can temporarily reduce the number of cpu to reduce the concurrent "bedtools_split_cov" tasks running and reduce the risk of running out of ram. Let me know if this is not a ram issue. Best, Seb

yuxiangtan commented 2 years ago

Hi Seb,

The profile/split/sample42.log is empty.

In the main log from the print out, it said it is killed. I will rerun will only 1 cpu to see what will happen. Or you can let me know what else I can do to check the ram problem (BTW, the machine I am using have 96GB ram.

Best,

Yuxiang

Sebastien-Raguideau commented 2 years ago

If you can assemble on your machine, then you have passed the most ram intensive step and there is no reason you can't do the rest. I tend to use the command htop, since you can sort by percent ram used.

yuxiangtan commented 2 years ago

run.log

Now, after I run at only 1 core, the log looks like this. I am not too sure whether it means it passed the previous error. However, looks like it stuck at the current step since the current step had run for almost 24 hours already.

Best,

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang, All the the tasks called "bedtools_split_cov" are done, so you are good regarding that particular issue. If you want, you can stop the current run and increase the number of cpu again. Though I would wait a bit to see if the hanging tasks is done. From the log, it seems you are stuck at the task "coverage". It shouldn't take that long..... All it does is concatenate files created from bedtools_split_cov to get a coverage matrix. I would let it run at least another day before doing anything else. This part is written with awk and should be quite fast already. If this doesn't finish by tomorrow, I'll unearth an alternative python version of the same code and share it with you. Best, Seb

yuxiangtan commented 2 years ago

Hi, Seb

I just killed the job (since it stuck). But when I rerun with : /STRONG/bin//STRONG . -t 30 -s --rerun-incomplete --unlock &> run.log

It gave me the following log run.log

I am not sure whether it is caused by the kill, or sth else. And what should I do now?

Best,

Yuxiang

yuxiangtan commented 2 years ago

Hi, Seb,

I ran a new test on a very small set of simulation test data.

However, I still got error and not able to go through the whole process. (run.log attached) run.log

Up till now, I only successfully ran the whole pipeline on the provided test data, but all failed in other datasets. I am not sure what to do now, and I am not sure whether the docker I built is ready for public in this case.

Best,

Yuxiang

Sebastien-Raguideau commented 2 years ago

Hi Yuxiang, Sorry for the lack of answer previously, as your issue takes more time to solve I need to secure some on my schedule and never got around to do so. From a quick look at your log, that is linked to running gtdb. I updated the STRONG not too long ago to deal whith this particular issue of locating gtdb reference genome. I may need to pin version in the conda env, so that these don't occur in the futur. Please have a go at using latest version of STRONG and tell me how it goes. Cheers, Seb

yuxiangtan commented 2 years ago

Hi Seb,

I pulled the code and it fixed the gtdb problem

But I got a new problem at the plot_tree_fig step. Log attached.

run.log

Best,

Yuxiang

yuxiangtan commented 2 years ago

Hi @Sebastien-Raguideau ,

Any update about this since then? I updated the previous fixes in docker v1.

Best,

Yuxiang